Contents
This page describes the nomenclature systems for gene names and for systematic names of genes, Open Reading Frames (ORFs), and chromosomal features of Saccharomyces cerevisiae, as maintained by the Saccharomyces Genome Database.
A Gene Name, for example COX2 or CDC28, is a name conferred on a gene by a researcher. Names may be conferred on the basis of genetic as well as biochemical or molecular characterization of the gene. Thus, Gene Names may be conferred on any type of gene or feature that can be characterized genetically. Most genes having Gene Names are ORFs, but tRNAs and other non-protein coding RNAs have also received Gene Names. In addition, there are named genes in SGD that have not yet been mapped to a physical location on the chromosome.
The official Gene Name of an S. cerevisiae gene is referred to as the Standard Name on an SGD locus page, and generally becomes the standard name based on its publication in a peer-reviewed paper describing characterization of that gene. A gene name may also be reserved for a locus when publication of the name is upcoming, and is called a Reserved Name. A Reserved Name, if it remains unique and is the first published name, becomes a Standard Name upon its publication. In cases where it is not clear what name should be the standard name, the Standard Name is determined by an amalgam of 1) consensus of the research community, 2) literature usage, 3) clarity relative to function, and 4) priority in the literature. Any alternative Gene Name is referred to as an Alias.
When naming a gene, the full text of the Gene Naming Guidelines for Saccharomyces cerevisiae should be consulted. An explanation of the conventions for Saccharomyces cerevisiae nomenclature were published by Trends in Genetics in the gene nomenclature guide. The table below also describes the appropriate format for a Gene Name.
The Systematic Name is the name generated by the systematic sequencing project, or conferred later according to the appropriate guidelines for systematic nomenclature for that type of new feature or gene.
Every gene or feature, whether a protein-coding Open Reading Frame (ORF) or an RNA gene, that was called by the systematic sequencing project received a Systematic Name. Features that could be predicted by computational methods were called by the systematic sequencing project; these include nuclear-encoded ORFs, mitochondrially-encoded ORFs, and tRNAs.
There are guidelines for designating a Systematic Name for a new feature, i.e. one not originally named by the systematic sequencing project, depending on the feature type. The specifics depend on the type of feature, i.e. ORF, tRNA, etc. The table below describes the appropriate format for a Systematic Name.
While all ORFs in SGD have a Systematic Name, e.g. YAL001C, YGR116W, YAL034W-A, or Q0010, many ORFs have not been given a Gene Name, e.g. a name such as COX2 or CDC28. In addition, Gene Names have been conferred on non-ORF features such as tRNAs, e.g. SUP61 for tS(CGA)C, other non-coding RNAs including the RNA component of telomerase, e.g. TLC1, and on genetic loci which have not yet been mapped to a specific position on a chromosome. In this last case, because the chromosomal location is not known, there will not be a systematic name associated with the Gene Name.
An ORF, or other chromosomal feature, with a systematic name may have been associated with more than one common usage name, or Gene Name. Only one of these will be designated as the Standard Name; any other associated name is referred to as an Alias.
This table describes the nomenclature conventions, including the formats, for Gene and Systematic Names.
| Gene Nomenclature | ||
|---|---|---|
| Gene Names, for any type of Feature | ||
| Format | Gene Names in S. cerevisiae are generally three letters
followed by a number. For detailed guidelines about the criteria for
Gene Names in S. cerevisiae, please see our Guidelines
when naming S. cerevisiae genes Different copies of duplicated genes may be indicated by an extension to the end of the Gene Name. This extension can made by either adding a letter, e.g. 'A' or 'B', as in the case of the ribosomal protein genes, or by adding a hyphen and a number, e.g. '-1', '-2', as in the case of the YRF1 genes encoding the Y'-helicase or the ribosomal RNA genes. Gene names for many types of chromosomal features follow this basic format, regardless of the type of feature named, whether an ORF, a tRNA, another type of non-coding RNA, an ARS, or a genetic locus which has never been cloned or mapped to a specific chromosomal location. |
|
| Examples | CDC28 - a Gene Name conferred on a nuclear ORF on the basis of genetic characterization ADH1 - a Gene Name conferred on a nuclear ORF on the basis of biochemical characterization COX2 - a Gene Name conferred on a mitochondrial ORF on the basis of its enzymatic activity RPL1A - a Gene Name conferred on one copy of the gene encoding a copy of large subunit protein 1 RPL1B - a Gene Name conferred on the other copy of the gene encoding a copy of large subunit protein 1 SUP61 - a Gene Name conferred on a tRNA on the basis of genetic characterization TLC1 - a Gene Name conferred on an RNA gene on the basis of its presence as a component of a particular enzyme LSR1 - a Gene Name conferred on the U2 small nuclear snRNA SNR3 - a Gene Name conferred on a small nucleolar RNA (snoRNA) RDN18-1 - a Gene Name conferred on a copy of an RNA gene the corresponds to the 18S rRNA RDN18-2 - a Gene Name conferred on another copy of an RNA gene the corresponds to the 18S rRNA HRS3 - a Gene Name conferred on a genetic locus whose chromosomal location is not known ARS1 - a Gene Name conferred on an Autonomously Replicating Sequence (ARS) based on its characterized activity |
|
| Systematic Nomenclature for Genes | ||
| Nuclear ORF | ||
| Format | For nuclear encoded ORFs, the systematic names begin
with the letter 'Y'; the second letter corresponds to the chromosome
number (given in Roman numerals), e.g. chr I is 'A', chr VIII is 'H';
the third letter is either 'L' or 'R' for left or right of the
centromere; next is a three digit number indicating the order of the
ORFs on one arm of a chromosome starting from the centromere; finally,
there is an additional letter, either 'W' or 'C' for Watson (the
strand with 5' end at the left telomere) and Crick (the complement
strand, 5' end is at the right telomere). Nuclear ORFs that were not called by the systematic sequencing project when the initial names were assigned receive a systematic name based on that of the centromere proximal ORF and have a hyphen and letter (A, B, C, ...) to indicate the order between previously assigned ORFs. When multiple new open reading frames are identified between previously assigned ORFs, the letter designation assigned to each is based on the order in which they were discovered, and is independent of strand. See the Systematc Names - Protein Coding ORFs help page for more explanation (including diagrams and examples). |
|
| Examples | YAL001C - first ORF to the left of the centromere on
chromosome I (A is the 1st letter of the English alphabet), on the
complement or Crick strand YGR116W - 116th ORF right of the centromere on chromosome VII (G is the 7th letter of the English alphabet), on the Watson strand. YAL034W-A - an ORF not called initially, on the Watson strand of the left arm of chromosome I, further from the centromere than YAL034C |
|
| Mitochondrial ORF | ||
| Format | For mitochondrially encoded ORFs, the systematic names start with a 'Q', to designate the mitochondrial chromosome; the rest consists of a four digit number. | |
| Examples | Q0010 - an ORF encoded in the mitochrondrion Q0032 - another ORF encoded in the mitochrondrion |
|
| Nuclear-encoded tRNA | ||
| Format | For tRNAs, the systematic names begin with a lowercase 't'; the second letter corresponds to the single letter code for the appropriate amino acid, e.g. A = alanine, C = cysteine, etc.; next the sequence of the anticodon of the tRNA is given in the 5' -> 3' direction within parentheses, e.g. (AGC), (GUC); finally, there is an indication of which chromosome the tRNA gene resides on using the letters 'A' through 'P' to designate nuclear chromosomes (in the same way as for nuclear-encoded ORFs).If a given nuclear chromosome contains more than one copy of a tRNA gene, individual copies of the same tRNA family (those of identical sequence, including the anticodon sequence) are distinguished from each other by the addition of a single number, starting with '1', after the letter designating the chromosome. | |
| Examples | tC(GCA)B - a tRNA for cysteine, with the anticodon
sequence 'GCA', located on chromosome II tS(AGA)D1 - a tRNA for serine, with the anticodon sequence 'AGA', one of two or more tRNAs from this family (containing the AGA anticodon) located on chromosome IV tS(AGA)D3 - another tRNA for serine, also with the anticodon sequence 'AGA', one of two or more tRNAs from this family (containing the AGA anticodon) located on chromosome IV |
|
| Mitochondrially-encoded tRNA | ||
| Format | Mitochondrially-encoded tRNAs are named the same way as nuclear-encoded tRNAs, except that the letter 'Q' is used to designate the mitochondrial chromosome. In addition, for mitochondrially-encoded tRNAs, the presence of a number indicates that two or more tRNAs encode the same amino acid, though they do not necessarily contain the same anticodon sequence. | |
| Examples | tW(UCA)Q - a tRNA for tryptophan, with the anticodon
sequence 'GUC', located on the mitochondrial chromosome tR(UCU)Q1 - a tRNA for arginine, with the anticodon sequence (UCU), one of two or more tRNAs for arginine on the mitochondrial chromosome tR(ACG)Q2 - a tRNA for arginine, with the anticodon sequence (ACG), one of two or more tRNAs for arginine on the mitochondrial chromosome |
|
| snRNAs and snoRNAs | ||
| Format | The systematic name of a small nuclear RNA (snRNA) or small nucleolar RNA (snoRNA) starts with the lowercase letters 'sn'; next is a capital 'R'; this is followed by a number by a number. The number is unique, but does not convey any positional information. Frequently, the Gene Name of snRNAs and snoRNAs is the same as the Systematic Name, but with all caps, e.g. 'SNR'. Different copies of duplicated genes may be indicated by either adding a letter, e.g. 'A' or 'B' to the end of the name. | |
| Examples | snR6 - a snRNA, produces the U1 spliceosomal RNA snR17a - a snoRNA, one of two copies of snoRNA U3 snR17b - a snoRNA, one of two copies of snoRNA U3 |
|
| Nuclear-encoded rRNA | ||
| Format | The Systematic Names and Gene Names of loci representing the
nuclear encoded rRNA genes are identical. The "loci" representing the
rDNA repeats, the rRNA transcripts, and the mature rRNAs are named
with the three letter acronym 'RDN' for Ribosomal DNa. While
S. cerevisiae contains multiple repeats of the ribosomal DNA
(rDNA), only two rDNA repeats were sequenced as part of the systematic
sequencing project. A more complete explanation of the representation and naming of the rDNA repeats and rRNAs within it is present on the RDN1 locus page which represents the entire rDNA region on Chromosome XII. |
|
| Examples | RDN1 - the entire 1-2 Mb rDNA region on Chromosome XII, consisting of 100-200 tandem copies of a 9.1 kb repeat which contains the genes for 5S, 5.8S, 25S and 18S rRNAs RDN18 - represents the regions which encode the 18S ribosomal RNA RDN18-1 - represents a specific copy of a region which encodes an 18S ribosomal RNA RDN37 - represents the regions which encode the primary transcript which is processed into the 25S, 18S and 5.8S rRNAs RDN37-2 - represents a specific copy of a region which encodes a primary rRNA transcript which is processed into the 25S, 18S and 5.8S rRNAs |
|
| Systematic Nomenclature for Other Chromosomal Features | ||
| Autonomously Replicating Sequence - ARS | ||
| Format | Autonomously Replicating Sequences (ARS) are named with the three letters ARS followed by a number. ARS features added after October 2000 are named systematically using the three letters ARS followed by one or two digits to represent the chromosome, e.g. chromosome I = 1, chromosome II = 2, chromosome X = 10. This is followed by an additional whole number to designate the particular ARS on that chromosome in the order named, starting with the digits '01'. Note that the number merely indicates the order in which the ARS elements were reported and named, and does not necessarily denote any location information relative to other ARS features. Note also that decimal points are NOT used. Some "historical" ARS features were given Gene Names prior to the establishment of this systematic naming system, e.g. ARS1, ARS2, ARS120. In these cases, an ARS-based Gene Name does not make any indication as to the chromosomal location. | |
| Examples |
ARS1 - A named ARS ARS301 - An ARS on chromosome III ARS319 - Another ARS on chromosome III ARS601 - An ARS on chromosome VI ARS1009 - An ARS on chromosome X |
|
| Centromere - CEN | ||
| Format | Centromeres are named with the three letters 'CEN' followed by one or two digits to represent the chromosome, e.g. chromosome I = 1, chromosome II = 2, chromosome X = 10. | |
| Examples | CEN1 - The centromere on Chromosome I CEN2 - The centromere on Chromosome II CEN10 - The centromere on Chromosome X |
|
| Ty Element | ||
| Format | The systematic name of a full length Ty element starts with a 'Y'; the second letter corresponds to the chromosome number (given in Roman numerals), e.g. chr I is 'A', chr VIII is 'H'; the third letter is either 'L' or 'R' for left or right of the centromere; the fourth letter is either 'W' or 'C' for Watson (the strand with 5' end at the left telomere) and Crick (the complement strand, 5' end is at the right telomere); next are the letters 'Ty' followed by a number, 1-5, to indicate the type of Ty element. The first Ty element of a given type is indicated with -1; additional full length Ty elements of the same type on the same chromosome are given a number incremented by one from the previous one. | |
| Examples | YARCTy1-1 - A Ty element of type 1 on the right arm of Chromosome I, on the Crick strand YCLWTy5-1 - A Ty element of type 5 on the left arm of Chromosome III, on the Watson strand YDRCTy1-1 - A Ty element of type 1 on the right arm of Chromosome IV, on the Crick strand YDRCTy1-3 - Another Ty element of type 1 on the right arm of Chromosome IV, on the Crick strand YDRWTy1-4 - Another Ty element of type 1 on the right arm of Chromosome IV, this one on the Watson strand |
|
| Ty LTR | ||
| Format | The systematic name of a Ty LTR element starts with a 'Y'; the second letter corresponds to the chromosome number (given in Roman numerals), e.g. chr I is 'A', chr VIII is 'H'; the third letter is either 'L' or 'R' for left or right of the centromere; the fourth letter is either 'W' or 'C' for Watson (the strand with 5' end at the left telomere) and Crick (the complement strand, 5' end is at the right telomere); next is a word for a Greek letter indicating the type of LTR element, e.g. 'delta', 'sigma', 'tau', 'omega'. The first Ty LTR element of a given type is given the number '1'; additional Ty LTR elements of the same type on the same chromosome are given a number incremented by one from the previous one. | |
| Examples | YALWdelta1 - A Ty LTR of the delta type on Chromosome I YARCdelta8 - Another Ty LTR of the delta type on Chromosome I YARWsigma1 - A Ty LTR of the sigma type on Chromosome I YBLWtau1 - A Ty LTR of the tau type on Chromosome II YCLWomega1 - A Ty LTR of the omega type on Chromosome III |
|
| Telomeric Elements | ||
| Format | SGD currently annotates the following features at the ends of chromosomes (click on the element name for a definition):
|
|
| Examples |
TEL08L is the entire telomeric region on the left arm of chromosome VIII (encompases all the other telomeric elements) TEL08L-XC is the X element Core sequence located on the left arm of chromosome VIII TEL08R-XR is the X element Combinatorial Repeat located on the right arm of chromosome VIII TEL12L-YP1 is the first (closest to the end of the chromosome) Y' element located on the left arm of chromosome XII TEL12L-YP2 is the second Y' element located on the left arm of chromosome XII TEL08R-TR1 is the first (closest to the end of the chromosome) Telomeric Repeat located on the right arm of chromosome VIII TEL08R-TR2 is the second Telomeric Repeat located on the right arm of chromsome VIII |
|
Return to Saccharomyces Genome Database |
Send a Message to the SGD Curators ![]() |