The answers to some frequently asked questions can be found on this page. Use the broad categories listed below to jump to the section relevant to your question. If you cannot find the answer you are looking for you can send email to the TAIR curators at curator@arabidopsis.org.


Genome Annotations

Where can I find a list of coordinates for all genes (including UTRs, introns and exons) and other transcripts?

Please go to ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release to find the complete list of all gene models from the TAIR10 release. These lists are available in different formats such as gff and xml formats and some contain specific subsets of genes such as all genes whose structures changed between releases TAIR9 and TAIR10, or all new gene models. Coordinate information of other sequences such as ESTs and cDNAs as well as coordinates of markers and polymorphisms, can be found at ftp://ftp.arabidopsis.org/home/tair/Maps/seqviewer_data.

Where do the gene structural annotations in TAIR come from?

The Arabidopsis genome was initially annotated by the Arabidopsis Genome Initiative (AGI) and later reannotated by TIGR in collaboration with MIPS and TAIR. TAIR assumed primary responsibility for maintaining the Arabidopsis genome annotation in North America following TIGR's final release (TIGR5), producing 5 additional genome releases, TAIR6 through TAIR10. As of 2014, genome annotation is being handled by Araport starting with the release of Araport11.

Where can I find a list of GenBank accessions that correspond to AGI locus identifiers?

A file containing the mapping can be downloaded from the TAIR ftp site TAIR10_genome_release.

Why do AGI loci in TAIR differ from other sources

This is probably due to differences in annotation methods used by TIGR and MIPS. You can see annotations in TIGR and MIPS by clicking on their respective links in the "External Links" band on TAIR locus/gene detail pages.

Related pages:
MIPS ; MATDB
TIGR ; Arabidopsis Database

Why do the coordinates for a gene/BAC/5'UTR etc... differ in TAIR from what I expect?

There are several reasons why the coordinates of various gene/genome features can differ:

  • The annotation may have changed between versions of the genome release. Currently we are using the TAIR10 release for the SeqViewer and many BLAST datasets.
  • BAC lengths in GenBank differ from AGI BACS. Many of the BAC sequences used by the AGI were trimmed to remove overlaps whereas clones with the same name in GenBank are untrimmed and longer. Other AGI BACS were extended by TIGR, adding sequences from adjacent BACs for easier annotation of genes near BAC ends. On SeqViewer, the AGI BACs are called Assembly Units.
  • Structural annotations may differ between GenBank and AGI sequences. TAIR builds gene models based on a combination of cDNA and EST sequences from GenBank. A gene's UTR may therefore be extended based on EST alignments even though cDNA is in relative terms truncated.

Related pages:
TAIR Genome Annotation Process
Search Locus History

Are ALL Arabidopsis genes annotated? My gene seems to be missing.


  • Some gene models may not have been predicted by our annotation process because of limitations of the gene prediction software. If you are using BLAST or another sequence similarity search tool on TAIR, choose AGI whole genome (BAC clones), or GenBank whole genome (BAC clones) datasets. Unannotated genes will not be found in AGI transcript, protein, cds or gene data sets.
  • "Missing" genes may reside in one of the few remaining known gaps in the sequence including highly repetitive regions that are difficult to sequence. See:Genome Update pages for information about gaps, incomplete clone sequences and genome monitoring status.
  • If you have sequenced a gene that is not in the database, please contact TAIR to update gene structural data, update functional data and provide functional and structural information
  • Some genes which existed in earlier annotation releases may have been obsoleted, or merged or split into 2 new gene models. To find out when a gene was added or removed, or whether a gene has been split or merged, try the TAIR locus history search

How can I obtain a list of functional categories for a set of genes?

TAIR curators annotate Arabidopsis genes using Gene Ontology terms to describe a genes molecular function, subcellular localization and biological process. GO annotations in TAIR also include contributions from the TAIR community, UniProt and the GO consortium.To obtain all of the GO annotations for a set of genes:

  • Go to Tools-> Bulk Data Retrieval->GO Annotations.
  • Paste in or upload a list of locus identifiers for your genes (e.g. AT1G23030).
  • Choose html or text for greater than 1000 genes.
  • If you save the html file as text from your browser or save as text, you can open the file using spreadsheet software such as Microsoft Excel.

Related Pages:
About TAIR GO annotations (Arabidopsis Info-> Ontologies-> Gene Ontology Annotation at TAIR) ;TAIR GO annotations tutorial; Gene Ontology Consortium

Software/Analysis Tools

What are the specialized datasets available at TAIR for similarity searching (e.g. BLAST, PatMatch)?

TAIR's datasets include: AGI transcript, peptide, cds and gene sequences; introns only; insertion flank sequences; locus upstream and downstream sequences and more. For a full listing of the available data sets see: Datasets. These data sets are common to BLAST and PatMatch.

Can I obtain TAIR software/tools?

Yes you can.

Related pages:
TAIR software and licensing information ; PatMatch Software

How do I interpret the error message I get when I submit a sequence?

TAIR's algorithm for determining if a sequence is nucleotide or amino acid uses the following rules:

  • For sequences submitted as proteins: an error is returned if the sequence contains the letters J or O, or if the sequence contains any of EFILPQXZ) and also has U.
  • For sequences submitted as proteins: Sequences containing ONLY ACGTN or has the letter U but not any of (EFILPQXZ) are treated as nucleotide sequences.
  • For sequences submitted as nucleotide:If the sequence has the letters J or O, or has any of (EFILPQXZ) and also has U,it will be considered not a valid nucleotide sequence.
  • For sequences submitted as nucleotide:If the sequence has EFILPQXZ and does not have U it is treated a protein sequence.

owser you will need to open the downloaded file using Safari. Alternatively, until this problem is resolved, you can choose to have the BLAST results returned via email, or use another browser on OSX such as Internet Explorer or Netscape.

Why can't I upload more sequences into TAIR's NCBI BLAST?

Currently we enforce limits on the size of queries (based on input file size and the dataset being queried) to ensure that some queries don't clog up the server. We are working on hardware updates that will enable us to modify these restrictions.

Related pages:
NCBI BLAST limits.

SeqViewer

What is the most recent version of the reference genome assembly?

As of 2016 the current assembly (ordered sequence of nucleotides) for the reference genome is TAIR 10 which is the same as TAIR 9. In 2024 the assembly will be replaced with the Col-CC assembly.

How can I convert sequence locations in base pairs to centimorgans (cM) and vise versa?

Converting genetic locations to sequence locations will only give an approximate correlation. This is because the conversion depends upon both the genetic map used and the frequency of recombination which is variable within the genome. The most accurate estimate would obtained by comparing the Lister and Dean RI map to the genome sequence (AGI map) for a global genome comparison value. You can also visually align genetic and sequence based maps using the MapViewer tool, by aligning common markers from the genetic and sequence maps.

MapViewer

How up to date is the Lister/Dean RI map data in the MapViewer?

The RI map was last updated from NASC mapping data from May, 2001

Gene Expression

Please note that TAIR stopped accepting new microarray data submissions in June 2005. Newer and more comprehensive microarray data sets are available at GEO, ArrayExpress and NASCArrays.

Where can I find information about experimental conditions/design of AFGC microarray experiments.

You can now search for microarray experiments directly from the Microarray Elements Search. In addition, the Expression Viewer display now has direct links to the microarray experiment details. Click on the name of the hybridization in the Expression Viewer to display the information about the experiement.

How do I open microarray data files I have downloaded from the TAIR FTP site?

The raw data files for microarray experiments are large and therefore have been compressed. To uncompress the microarray datafiles (with .gz suffix) do the following: For MAC/UNIX/LINUX from the command line in a terminal window type in gzip -d /home/yourname/yourpath/filename.gz . For example: gzip -d /home/frank/franksfiles/ciw_2000.gz. For PC's you should download the WinZip utility to decompress the files.



Markers and Polymorphisms

Where can I find segregation data for recombinant inbred lines?


  • For the Lister and Dean ColXLer Map go to NASC RI map data and click on "Marker scores for latest map" in the sidebar.
  • For the Koorneef LerXCvi lines go to TAIR's FTP site :FTP->Maps -> Ler_Cvi_RIdata

Why do the BAC locations given for SNPs and INDELS from the CEREON database differ from their location when I try to locate them on the BAC?

The difference position of the SNP provided by Cereon may be due to a difference in the length of the BAC reported in different versions of the BAC sequence. For example, many of the BAC sequences used by the AGI for the genome sequence had sequences at the ends trimmed resulting in a shorter length BAC. To obtain an accurate location on the genome try one of the following:

  • Use TAIRs BLAST to match the SNP sequence against the AGI whole genome or Genbank whole genome datasets.
  • Use the SeqViewer and
    • Paste the SNP/INDEL sequence into the input box.
    • Choose search by sequence.
    • Hits will be displayed as red lines on the chromosome bars.
    • Click on the chromosome bar to zoom in on the region.

Related pages: SeqViewer ; SeqViewer Help ; BLAST



Bulk Download datasets

Where can I get the most up to date TAIR data sets?

If you are a subscriber the most up to date data is available through the website on locus and other detail pages as well as bulk download tools. We also prepare quarterly data releases which include up to date gene names,locus summaries, Gene Ontology annotations, Plant Ontology annotations, and publications. Subscribers can access the most recent releases here. Year old quarterly releases are available here for non subscribed institutions/users.

How can I read files I download from TAIR?

Tab-delimited files from the bulk tools or Downloads site can easily be opened using a spreadsheet program such as Microsoft Excel.

  • Open the file in the application and follow the instructions for choosing column delimiters (tabs) and column format (use text as a default for all).

How do I generate tab delimited files to upload into TAIR bulk search tools?

Spreadsheet programs like Microsoft Excel allow you to save your file as tab delimited text. Excel spreadsheets (with the .xls extension) cannot be used to upload a list into the bulk search pages.

How can I obtain specific datasets such as all sequenced genes, or all markers in a defined region?

Each of the search results pages includes the option to obtain a listing of specific records or a set of records that you can download and open in a spreadsheet. Information about the downloaded fields can be found in the help pages. If there is a specific set of data that you would like , contact the curators and we will do our best to accommodate your request. User requests are placed in the FTP site under the User Requests directory.

Posting Jobs

How do I post a job opening at TAIR?

Please email a PDF/Word description of your job listing and/or URL to the posting in your website to curator@arabidopsis.org. We will put up the ad for no charge. Job postings are also routinely obtained as they are listed on the TAIR newsgroup. Your posting will appear on TAIR shortly after it is sent out to the newsgroup.

Related Pages:The Arabidopsis Newsgroup ; Job Postings

How do I find out about job openings at TAIR?

Jobs at TAIR are posted in the TAIR news section.

Related Pages:
TAIR news

New and updated data

How often is data updated in TAIR?

Data in TAIR is constantly being updated and new data is constantly being added. Information about significant updates and additions is posted in the Breaking News section of the website as well as in the TAIR news section. If you are registered at TAIR, you can choose to receive quarterly news updates sent to your email address (see:Registration Help for information on registration and opting into the email updates).

Datasets that are frequently updated include:


  • BLAST, PATMATCH datasets-AGI datasets and other datasets derived from the AGI sequences are updated after each major genome release. The current datasets reflects changes from the last TAIR release (TAIR10, November 2010), the next release will be for Araport 11.
  • Seed and DNA stocks from ABRC-variable.
  • Publications -downloaded weekly from PubMed.
  • Gene/Locus summary updates-weekly
  • Gene Ontology Annotations-weekly updates from TAIR curation, monthly from GO consortium
  • Polymorphisms and Phenotypes - weekly updates from TAIR literature curation

Data Submissions and Corrections

How do I correct information about an incorrectly annotated gene?


I have published a paper. How do I make associated software/ supplementary data available at TAIR?

Yes, please contact us and to arrange data/software submission.

How do I submit data to TAIR?

See Data Submission section for instructions on how to submit Marker/Polymorphism, Gene Family, Functional Genomics Gene Lists and other data to TAIR.

Gene nomenclature

Before I name my gene, how can I find out if a gene name is in use?

Consult the Gene Symbol List for a list of gene names that have been reserved and are not available. You can also use the TAIR quicksearch to search for the name in the database or anywhere on the website (e.g. the list of Arabidopsis gene families).

Related pages:
TAIR nomenclature guide

What is the difference between a hypothetical, unknown and putative protein?

Putative proteins are similar to a known gene. Unknown proteins are not similar to a known gene but do have EST or cDNA matches showing that they are expressed. Hypothetical proteins have no EST or cDNA matches and are not similar to a known gene, so there is no evidence that they are expressed genes.

Related Pages:
TIGR naming conventions

FAQs for Developers

Frequently asked questions for software developers and casual programmers

What are the proper procedures for scripting TAIR database pages?

TAIR has many resources that users may access through automated retrieval programs instead of through a web browser. If you intend to use scripts to extract data from TAIR, please note the following:

  • Please contact us at curator@arabidopsis.org with your data requirements to see whether we can supply the data you need through a custom script directly against our database rather than submitting requests through scripts.
  • Do not overload TAIR's systems by submitting requests through multiple threads or programs. Flooding the server with requests can lead to many problems including a denial of service to others trying to use the website.
  • Run retrieval scripts during off hours such as weekends or between 9PM and 5AM Eastern Time on weekdays.
  • Make no more than one request every 3 seconds.
  • TAIR will block access from sites which overload the servers with requests with no warning.
  • TAIR features are under continuous development. URLs, query syntax and parameters may change without warning.
  • If you receive error messages or no results, please do not rerun your program, but contact us at curator@arabidopsis.org and we will assist you.

Related Pages:
Hyperlinking to TAIR

  • No labels