The answers to some frequently asked questions can be found on this page. Use the broad categories listed below to jump to the section relevant to your question. If you cannot find the answer you are looking for you can send email to the TAIR curators at curator@arabidopsis.org.
If you are accessing TAIR off site -ensure that you are accessing through your library proxy as you would any other library resource.
If you are accessing TAIR on site- please contact your librarian with the IP address of the computer you are using to ensure that your address is within the registered ranges for your institution.
If you are having a problem with the TAIR website/database that is software related (accessing content not the content itself) please submit a bug report and include:
Please go to to the Downloads directory> Genes where you will find subdirectories for all of the genome releases. These lists are available in different formats such as gff and gtf formats and some contain specific subsets of genes such as all genes whose structures changed between releases TAIR9 and TAIR10 assembly, or all new gene models.
Genome assembly refers to the ordered nucleotide sequence of each chromosome of the reference genome. The current version is called TAIR10. Genome annotation refers to the gene calls /coordinates on the reference genome. The current version is called Araport11.
The Arabidopsis genome was initially annotated by the Arabidopsis Genome Initiative (AGI) and later reannotated by TIGR in collaboration with MIPS and TAIR. TAIR assumed primary responsibility for maintaining the Arabidopsis genome annotation in North America following TIGR's final release (TIGR5), producing 5 additional genome releases, TAIR6 through TAIR10. As of 2014, genome annotation was handled by Araport (Araport11, released 2016).
Where can I find a list of GenBank accessions that correspond to AGI locus identifiers?
A file containing the mapping can be downloaded from the TAIR ftp site TAIR10_genome_release.
Please do not self assign AGI locus IDs, contact us first.
There are several reasons why the coordinates of various gene/genome features can differ:
TAIR curators annotate Arabidopsis genes using Gene Ontology terms to describe a genes molecular function, subcellular localization and biological process. GO annotations in TAIR also include contributions from the TAIR community, UniProt and the GO consortium.To obtain all of the GO annotations for a set of genes:
TAIR's datasets include: AGI transcript, peptide, cds and gene sequences; introns only; insertion flank sequences; locus upstream and downstream sequences and more. For a full listing of the available data sets see: Datasets. These data sets are common to BLAST and PatMatch. You can download TAIR BLAST datasets from the Downloads> Sequences directory.
TAIR's algorithm for determining if a sequence is nucleotide or amino acid uses the following rules:
owser you will need to open the downloaded file using Safari. Alternatively, until this problem is resolved, you can choose to have the BLAST results returned via email, or use another browser on OSX such as Internet Explorer or Netscape.
Currently we enforce limits on the size of queries (based on input file size and the dataset being queried) to ensure that some queries don't clog up the server. We are working on hardware updates that will enable us to modify these restrictions. If you need to process lots of sequences you can create your own local install of BLAST and use the TAIR blast datasets.
As of 2016 the current assembly (ordered sequence of nucleotides) for the reference genome is TAIR 10 which is the same as TAIR 9. In 2024 the assembly will be replaced with the Col-CC assembly.
Converting genetic locations to sequence locations will only give an approximate correlation. This is because the conversion depends upon both the genetic map used and the frequency of recombination which is variable within the genome. The most accurate estimate would obtained by comparing the Lister and Dean RI map to the genome sequence (AGI map) for a global genome comparison value. You can also visually align genetic and sequence based maps using the MapViewer tool, by aligning common markers from the genetic and sequence maps.
Please note that TAIR stopped accepting new microarray data submissions in June 2005. Newer and more comprehensive microarray data sets are available at GEO, ArrayExpress and NASCArrays.
You can now search for microarray experiments directly from the Microarray Elements Search. In addition, the Expression Viewer display now has direct links to the microarray experiment details. Click on the name of the hybridization in the Expression Viewer to display the information about the experiement.
The raw data files for microarray experiments are large and therefore have been compressed. To uncompress the microarray datafiles (with .gz suffix) do the following: For MAC/UNIX/LINUX from the command line in a terminal window type in gzip -d /home/yourname/yourpath/filename.gz . For example: gzip -d /home/frank/franksfiles/ciw_2000.gz. For PC's you should download the WinZip utility to decompress the files.
In the Downloads section under the Maps directory.
The difference position of the SNP provided by Cereon may be due to a difference in the length of the BAC reported in different versions of the BAC sequence. For example, many of the BAC sequences used by the AGI for the genome sequence had sequences at the ends trimmed resulting in a shorter length BAC. To obtain an accurate location on the genome try one of the following:
Bulk download functions have been integrated into the Gene Search and Protein Search. See the Bulk Downloading Gene Data help document for specifics.
If you are a subscriber the most up to date data is available through the website on locus and other detail pages as well as bulk download tools. We also prepare quarterly data releases which include up to date gene names,locus summaries, Gene Ontology annotations, Plant Ontology annotations, and publications. Subscribers can access the most recent releases here. Year old quarterly releases are available here for non subscribed institutions/users.
Tab-delimited files from the search results or Downloads site can easily be opened using a spreadsheet program such as Microsoft Excel or a text editor such as BBedit.
Each of the search results pages includes the option to obtain a listing of specific records or a set of records that you can download and open in a spreadsheet. Information about the downloaded fields can be found in the help pages. If there is a specific set of data that you would like , contact the curators and we will do our best to accommodate your request.
Please email a PDF/Word description of your job listing and/or URL to the posting in your website to curator@arabidopsis.org. We will put up the ad for no charge. Job postings are also routinely obtained as they are listed on the TAIR newsgroup. Your posting will appear on TAIR shortly after it is sent out to the newsgroup.
Jobs at TAIR are posted in the TAIR news section.
Data in TAIR is constantly being updated and new data is constantly being added. Information about significant updates and additions is posted in the Breaking News section of the website as well as in the TAIR news section. If you are registered at TAIR, you can choose to receive quarterly news updates sent to your email address (see:Registration Help for information on registration and opting into the email updates).
Datasets that are frequently updated include:
Data Submissions and Corrections
How do I correct information about an incorrectly annotated gene?
I have published a paper. How do I make associated software/ supplementary data available at TAIR?
Yes, please contact us and to arrange data/software submission.
How do I submit data to TAIR?
See Data Submission section for information on what data types TAIR accepts directly.
Gene nomenclature
Before I name my gene, how can I find out if a gene name is in use?
Consult the Gene Symbol List for a list of gene names that have been reserved and are not available. You should also search for the name in the TAIR database because the registry only contains a fraction of the names in use. UniProt is another source of curated gene names.
Related pages:
TAIR nomenclature guide
What is the difference between a hypothetical, unknown and putative protein?
Putative proteins are similar to a known gene. Unknown proteins are not similar to a known gene but do have EST or cDNA matches showing that they are expressed. Hypothetical proteins have no EST or cDNA matches and are not similar to a known gene, so there is no evidence that they are expressed genes.
FAQs for Developers
Frequently asked questions for software developers and casual programmers
What are the proper procedures for creating hyperlinks to TAIR pages?