This page describes selected files that provide convenient lists and descriptions of genes and features in SGD. The content of these files varies with respect to type of information presented about each gene (Description of Content), the types of genes included (Scope), and the file Format. In some cases, the content of the file also varies according to whether or not a gene has been assigned a Standard or Reserved gene name (see the Note on Nomenclature and Scope of Files for more information).
Note: This is a selected subset of the data files available from SGD; if you can't find what you're looking for, check our ftp site. README files describing these and additional files are available in each directory of the ftp site.
Click on the file name to download (in some cases you may have to hold down the control key to review a list of options, one of which will likely allow you to download the file). For files ending in '.tab', Mac users may need to change the '.tab' extension to '.txt' in order to open these files in Excel. Click on the README link to get a description of the format of each file present in a given directory of the ftp site.
|
|||||
| File Name | Scope (see note) | Format | Description of Content | ||
|---|---|---|---|---|---|
| registry.genenames.tab | Named Genes Only (not all ORFs) |
TAB Delimited | Basic information including Standard name and any alias names, Systematic ORF name, SGDID, phenotype, gene product, and a basic description of the gene. | ||
| registry.genenames.txt | Named Genes Only (not all ORFs) |
TEXT | The same information as the registry.genenames.tab file
described above, but in a different file format. NOTE: Each piece of information about a gene will be on a separate line; entries for separate genes are separated by blank lines. |
||
|
|||||
| File Name | Scope (see note) | Format | Description of Content | ||
| SGD_features.tab | All chromosomal features (both ORF and non-ORF features) |
TAB Delimited | Comprehensive information about features at SGD, including Gene Name and any alias
names, the
Systematic Name, the feature type (e.g. ORF, tRNA, etc.), the
chromosomal location and coordinates, the genetic position, the
SGDID, and a basic description of the gene. Also includes the
chromosomal location and coordinates of CDS and introns. NOTE: There is a separate line for each feature, CDS, and intron. ORFs without introns will have a single exon; ORFs with introns will have multiple lines. |
||
| saccharomyces_cerevisiae.gff | All chromosomal features (both ORF and non-ORF features) |
GFF (version 3) | Information about the chromosomal location and coordinates,
feature_type, Gene Name and Systematic Name in GFF format (about
GFF and about GFF3
specifically). NOTE: Named protein coding genes will have one line for Gene information (by Gene Name) and one line for each portion of the Coding Sequence (CDS) information (by Systematic Name). Thus for a gene where the protein coding sequence is discontiguous, either due to introns or to translational frameshifting, there will be more than one line representing the coding sequence, i.e. one CDS line for each discrete portion of the coding sequence. |
||
|
|||||
| File Name | Scope (see note) | Format | Description of Content | ||
| gene_association.sgd.gz | Gene products (protein coding genes and RNA genes) |
TAB Delimited | Complete information about all GO annotations assigned to genes in SGD: the Gene Name, Systematic Name, and other Alias names for the gene annotated and its SGDID; the GO ID # of the GO term to which the gene product is annotated; whether a 'NOT' qualifier is associated with the annotation; the evidence code; any With or From information associated with the annotation; additional information required by the Gene Ontology Consortium annotation file format specifications. | ||
Every gene, whether a protein-coding Open Reading Frame (ORF) or an
RNA gene that was called by the systematic sequencing project,
received a Systematic Name. There are guidelines for
designating a Systematic Name for a new feature, i.e. one not
originally named by the systematic sequencing project, depending on
the feature type. A Gene Name is conferred by the research
community by the publication of a name in a paper describing
characterization of a gene. The conventions for writing
Saccharomyces cerevisiae gene and allele names and genotypes
were published by Trends in Genetics in the gene
nomenclature guide. For detailed descriptions of the formats of
Gene and Systematic Names for genes and other chromosomal features in
SGD, see the SGD Gene
Nomenclature Conventions page. When naming a gene, the full
description of the Saccharomyces
Gene Naming Guidelines should be consulted.
NOTE: While all ORFs
in SGD have a Systematic Name, e.g.YAL001C, YGR116W, YAL034W-A,
or Q0010, there are many that have not been given a Gene Name, either
a Standard Name or a Reserved Name, e.g. COX2 or
CDC28. In addition, Gene Names have been conferred on non-ORF
features, such as tRNAs, other non-coding RNAs such as the RNA
component of telomerase (TLC1), and on genetic loci which have
not yet been mapped to a specific position on a chromosome.
To best select a file suitable for your purpose, please be aware of
the scope of each file with respect to which genes, ORFs, and other
chromosomal features are and are not included.
Note on Nomenclature and Scope of Files
Gene Name vs. Systematic Name
Scope of Files
| Scope | Type of features included |
|---|---|
| Named Genes Only (not all ORFs) |
Files will contain information only about features which have been given a Gene Name, either a Standard Name or a Reserved Name. Thus these files will NOT include information on ORFs (protein coding genes) that have not been given Gene Names, and WILL include information about genetic loci that have never been mapped to a chromosomal position, but which have been given Gene Names. |
| ORFs (protein coding genes only) |
Files will contain information about all ORFs (protein coding genes), regardless of whether or not they are also associated with a Gene Name (i.e. a Standard Name or a Reserved Name). |
| Gene products (protein coding genes and RNA genes) |
Files will contain information about chromosomal features which correspond to gene products, either protein or RNA products, including ORF (protein coding genes), Ty ORF, tRNA, rRNA, snRNA, snoRNA, and other RNA gene features. Other sequence features (LTR, ARS, Transposon, pseudogene, and CEN) will not be included. |
| All chromosomal features (both ORF and non-ORF features) |
Files will contain information about all chromosomal sequence features including ORF (protein coding genes), LTR, tRNA, Ty ORF, snoRNA, ARS, Transposon, pseudogene, rRNA, CEN, RNA gene, and snRNA features. |
Return to Saccharomyces Genome Database |
Send a Message to the SGD Curators ![]() |