Contents
- FAQs about SGD
- Why hasn't SGD cited my paper?
- How do I propose a gene name?
- How can I download data?
- How should I cite SGD?
- How can I access functional genomics datasets through SGD?
- How can I get more help?
- FAQs about Gene Ontology at SGD
- What is Gene Ontology (GO)?
- How do I find which genes or proteins are annotated to a GO term?
- How can I analyze the GO terms assigned to a set of genes?
- How does SGD assign references for GO terms?
- FAQs about S. cerevisiae
We aim to collect all the available literature for each gene or protein of S. cerevisiae, so if your paper is not listed in SGD, it was missed rather than intentionally excluded. We find papers by searching PubMed, and may miss papers whose titles or abstracts do not contain an S. cerevisiae gene or protein name, or systematic name. To ensure that your paper is linked to the correct gene by SGD (and other databases), it's always a good idea to include the gene name, systematic name, and "S. cerevisiae" in the abstract. If we missed your paper, please drop us an email (yeast-curator@genome.stanford.edu) and we will be happy to add it.
Researchers who want to reserve a new S. cerevisiae gene name do so through SGD. The name is shown on the appropriate SGD Locus Page as 'reserved', and after publication the name becomes the standard gene name. SGD maintains a detailed list of guidelines for choosing and reserving new gene names, and for the resolution of conflicts over gene names. A web form is available for submitting gene name reservations.
Occasionally, existing gene names are changed to more accurately reflect the function or role of the gene product. Such changes are only made if there is consensus among all researchers who have studied the gene. SGD curators can coordinate the process of proposing and discussing gene name changes.
A comprehensive list of data available for download is found on the Download Data page. This page contains a link to the data download directory, containing the most commonly requested data files, organized into folders. A "README" document in this directory describes the contents of each folder. Clicking on a folder name opens it to show the names of the documents it contains. Each folder also contains a "README" document describing the files it contains. Clicking on a file name starts the process of downloading it to your computer. The file can be opened with any program that can handle tab-delimited text, including most word processing and spreadsheet programs.
There are no restrictions on academic use of the data, but they may not be repackaged or redistributed for profit-making enterprises. Contact the SGD Director Mike Cherry (cherry@genome.stanford.edu) for further information.
SGD maintains a list of publications describing SGD, written by SGD staff, that can be used as references to SGD as a database.
For references to the data contained within SGD, original references should be cited wherever possible. For unpublished information, you should get permission directly from the investigator who submitted the data to SGD if there is a contact listed for that information. Further instructions on how to cite SGD and other electronic resources may be found on the How to Cite SGD page.
Function Junction allows users to simultaneously search six different functional analysis project sites for all available functional information for a given gene or ORF. The projects searched are:
Function Junction may be accessed directly from the SGD home page or from its link on the Search Options page. Each Locus Page also has a link to Function Junction in the "Additional Information" section, leading to a Function Junction form already containing the name of the locus from which the tool was accessed. Results for a particular locus for some of these experiments or datasets are also individually accessible from the scroll boxes on the right side of the locus page: the Yeast PathCalling database is accessible from both the Interactions and Functional Analysis scroll boxes; the Yeast Microarray Global Viewer, Yeast Protein Function Assignment, and YGAC Triples Database are accessible from the Functional Analysis scroll box, and the Worm-Yeast Protein Comparison is accessible from the Comparison Resources scroll box.
To use Function Junction, select a set of functional analysis sites to search. Function Junction can search one, several, or all of the databases at a time in any combination. In order to choose the sites you would like to search, click the gray buttons next to their names. By default, all six sites are searched. Click the box "Submit" to begin the search.
One gene or ORF name may be searched by Function Junction at a time. Type the gene or ORF name into the box at the bottom of the form labeled "Enter a gene or ORF name." To retrieve a set of genes, the wildcard character (*) may be added at the end and/or beginning of the gene or ORF name. For example, to retrieve all the sterile genes, enter "ste*". This retrieves a list of gene names; clicking on any one leads to a Function Junction form for that gene.
Expression Connection is a tool for simultaneous searching of the results of multiple microarray studies for gene expression data for a given gene or ORF. The datasets searched and their references are listed on the Expression Connection page.
Expression Connection may be accessed directly from the SGD home page or from its link on the Search Options page. Each Locus Page has a link to Expression Connection in the "Additional Information" section, leading to an Expression Connection form already containing the name of the locus from which the tool was accessed. In addition, the "Functional Analysis" scroll box on the right side of the locus page allows the user to select individual functional genomics datasets and view the result for that locus.
To use Expression Connection, select a set of microarray datasets to search. Expression Connection can search one, several, or all of the datasets at a time in any combination. In order to choose the datasets you would like to search, click the gray buttons next to their names. By default, all datasets are searched. Click the box "Submit" to begin the search.
One gene or ORF name may be searched by Expression Connection at a time. Type the gene or ORF name into the box at the bottom of the form labeled "Enter a gene or ORF name." To retrieve a set of genes, the wildcard character (*) may be added at the end and/or beginning of the gene or ORF name. For example, to retrieve all the sterile genes, enter "ste*". This retrieves a list of gene names; clicking on any one leads to an Expression Connection form for that gene.
The primary convention followed by the vast majority of microarray experiments is this:Red means the mRNA level (expression) is increased, relative to the standard.
Black means the mRNA level (expression) is unchanged, relative to the standard.
Green means the mRNA level (expression) is decreased, relative to the standard.
Thus, the first feature displayed on the Locus' Expression page is a colorimetric scale bar, with which to determine the relative expression levels within the 2-color representation of the data.
Immediately below the scale bar is a table, the first row containing the locus whose expression information was requested initially. The following 19 rows contain those genes whose expression most highly correlates with the locus of interest.
Each row (ORF) contain the following columns, in order:
The 2-color representation often does not offer enough differentiation between data points, especially at the spectrum's extremes. Therefore, below the table is a graph reflecting the numerical values for the locus of interest.
Two hyperlinks are located to the graph's left:
All SGD help resources are listed on the Help Resources page. The 'Help' button in the upper right corner of each tool and Locus page is linked directly to help documentation for that particular page.
Getting Started with SGD provides an introductory overview of SGD and how to use it.
The Glossary page lists definitions of genetic, bioinformatic, and other terms used in SGD.
SGD curators may be contacted by using a web form, by direct email to yeast-curator@genome.stanford.edu, or by fax at (650) 723-7016. We welcome comments and questions from the yeast community!
GO is a collaborative project, involving SGD and other model organism databases, to provide controlled vocabularies that are used to describe the molecular function and cellular location of gene products and the biological process in which they are involved. The three ontologies that comprise GO (Molecular Function, Cellular Component, and Biological Process) are used by multiple databases to annotate gene products, so that this common vocabulary can be used to compare gene products across species. The development of the ontologies is ongoing in order to incorporate new information.
The GO Consortium website is the central repository for GO information and documentation, and for the ontologies themselves. SGD's GO Help page provides a brief introduction to GO and how it is used at SGD. SGD's GO tutorial is a guided tour of GO annotations and GO tools at SGD.
The GO system is meant to be as broadly applicable as possible across different species. Thus, species-specific jargon is avoided in the phrasing of terms. In cases where a term is relevant to multiple different organisms but is fundamentally different in one organism compared to the others, a 'sensu' phrase is appended to the term. For example, a term that has 'sensu Saccharomyces' in its name indicates that this term represents a process or component as it occurs in the genus Saccharomyces.
Whenever a GO term is displayed on an SGD Locus page, that term is hyperlinked to a list of all gene products annotated to that term in SGD. You can search for a particular GO term by typing all or part of the term into the Quick Search box at the top of most SGD pages. This will return a list of all terms matching the search criterion, along with lists of gene products annotated to each term.
To download a list of all GO annotations at SGD, go to the Literature Curation section of SGD's FTP site and download the file "orf_geneontology.tab".
SGD has two tools for analysis of GO classifications of groups of genes. The GO Term Mapper tool, which is explained in a detailed help document, takes a set of genes specified by the user and maps each to higher-level GO-Slim terms. The GO Term Finder tool, also with its own help page, takes the user's set of genes of interest and finds GO terms that are shared within the set.
In assigning Gene Ontology (GO) terms, our aim is to annotate each function, process, and location of the gene product with a reference that establishes the classification as directly as possible. For example, a paper demonstrating the enzymatic activity of a protein would be chosen over one in which the enzymatic activity was suggested by a mutant phenotype. Exceptions to this may occur when there is extensive literature about a gene product; in this case, we may reference a review article with Traceable Author Statement (TAS) evidence.Our aim is not to assemble a complete historical record of yeast research: if multiple papers provide the same type of evidence for a particular GO term, in general we only reference one of them. This may be the first paper that established that fact, or it may be the most recent paper published about the gene at the time the GO term was assigned. Thus, our failure to cite a particular paper for GO classifications in no way reflects on our opinion of its quality or relevance.
The current orientation is nothing more than a historical accident. The first map with chromosome numbers was published by Hawthorne and Mortimer (Genetics (1960) 45: 1085-1110). There were earlier maps by Lindgren and others but they did not name the chromosomes. It is clear from the first map the chromosomes were oriented with the long arm to the right of the centromere and this orientation has not changed. Since that 1960 paper, chromosomes I through X have been in the same orientation. Four more chromosomes (chromosomes XI through XIV) were added in 1966 (Mortimer and Hawthorne, Genetics (1966) 53: 165-173). For these early maps many chromosomes only had one or two markers. Thus it was just chance which markers were mapped first and thus fixed the orientation. By 1973 there were seventeen chromosomes (Mortimer and Hawthorne, Genetics (1973) 74: 33-54). Between 1980 and 1985 it was realized that chromosome XVII was the killer plasmid and not a nuclear chromosome; luckily no renumbering of the chromosomes was required.
The orientation of the genetic map was used to define which strand was Watson and which was Crick during sequencing. The Watson strand was defined as left -> right (5' to 3').
The start and stop coordinates refer to the entire chromosome, where position 1 is the first nucleotide of the left arm of the chromosome. The Watson strand is defined as running 5' to 3' from the left to right telomere, and the Crick strand is complementary to it.
The first letter of each name is 'Y', denoting 'yeast'. The second letter denotes chromosome number (I=A, II=B, etc.). The third letter denotes the chromosomal arm where each locus resides: L for left and R for right. Numbers were assigned to open reading frames sequentially from the centromeres out towards each telomere. The final 'W' or 'C' in the systematic name refers to location on the Watson or Crick strand. So, for example, YNR045W is the 45th open reading frame from the centromere on the right arm of chromosome XIV.
When additional open reading frames are identified between genes, an '-A' is appended to the systematic name of the adjacent gene closest to the centromere, e.g. an open reading frame on the Watson strand between YNR045W and YNR046W would be named YNR045W-A. If a second open reading frame were identified on the Crick strand in this region, it would be named YNR045C-B. A third new open reading frame on the Crick strand would then become YNR045C-C. In summary, new ORFs that are identified between previously existing ORFs are given a letter designation in the order in which they are identified, independent of strand. Please see the "Systematic Names - Protein Coding ORFs" help page for more explanation, including diagrams and examples. For more information about systematic nomenclature of both protein coding and non-protein coding genes click here.
This information can be found in the Systematic Sequencing Table, which is part of the Download Data section of SGD. There is a link to a set of guidelines for changing the systematic sequence on this page, too.
Since the release of the S. cerevisiae genomic sequence in 1996, several updates have been made to the nucleotide sequence. The Sequence Updates table summarizes these changes.
The strain sequenced for the systematic genome project was S288C. The
standard genotype of S288C is matalpha gal2 mal. In addition, the
chromosome IV sequence is from S288C derivative that contained a
nonsense (ochre) mutation in the TRP1 gene. The systematic sequence
shows no defect in the GAL2 gene, so it's not certain whether the
strain sequenced was actually a gal2 mutant. The MAL genes are a
heavily repeated gene family found near the telomeres of some yeast
strains and are not present in s288c.
A description of S288C is available from the American Type Culture Collection (ATCC):
ATCC Number: 26108 Organism: Saccharomyces cerevisiae Hansen Depositors/Designations: E. Cabib S288C <--- R.K. Mortimer. [MUCL 38902] Mating type/Genotype: MATalpha SUC2 mal gal2 CUP1 (E. Cabib, personal communication). Description/References: MATalpha sta1 sta2 sta3 STA10 (Curr. Genet. 7: 109-112, 1983). Production of: chitin synthetase zymogen (Biochem. Biophys. Res. Commun. 50: 186-191, 1973); methionine specific tRNA (Hoppe-Seyler's Z. Physiol. Chem. 352(s): 1231-1247, 1971); a pyrimidine-specific endoribonuclease (J. Bacteriol. 164: 57-62, 1985); and fructose 1,6-diphosphate from molasses (Biotechnol. Lett. 14: 495-498, 1992). Ribosome-binding assay (J. Biol. Chem. 246: 5854-5856, 1971). Structure of ribosomal subunits (Micron Microsc. Acta 23: 273-286, 1992). Transformation host (Biotechnol. Appl. Biochem. 17: 305-310, 1993). Wild type. Growth Conditions: Medium 200 30C Shipped: Freeze-dried Price Code: C Packing Class: 1 Revised: Jun 20 1997
S288c sub-strains used for sequencing each chromosome:
| Chr. no | S288c sub-strain |
|---|---|
| I | AB972 |
| II | genuine strain S288c |
| III | XJ24-24a and AB972 for 280kb, A364A and DC5 for the remaining 35kb |
| IV | AB972 |
| V | AB972 |
| VI | AB972 |
| VII | FY1679 |
| VIII | AB972 |
| IX | AB972 |
| X | FY1679 |
| XI | FY1679 |
| XII | AB972 |
| XIII | AB972 |
| XIV | FY1679, with ~ 5 kb from A364a (at ~ 183 kb - 188 kb) |
| XV | FY1679 |
| XVI | AB972 |
Historically, it has been estimated that there are roughly 6000 total genes in the yeast genome, of which 5885 were estimated to encode proteins. These estimates were based on the sequencing of the yeast genome (see Mewes et al. and Goffeau et al.).
SGD has since revised this estimate based on the integration of two new pieces of information:
There are many software packages that can do this type of analysis. The GCG software package contains the Composition tool, which will determine the composition of sequences along with dinucleotide and trinucleotide content for nucleotide sequences. Click here for an example of the results from the Composition program run on the yeast genome.
The S288C strain that was sequenced for the systematic sequencing project has a nonsense mutation in TRP1; the sequence in SGD reflects this. Check the GenBank sequences associated with the TRP1 locus page (for example, V01341.1).
When the systematic sequencing of the yeast genome was performed, only two repeats of the rDNA region on chromosome XII were sequenced. The RDN5, RDN37, RDN58, RDN25, and RDN18 loci shown on the chromosomal map are meant to represent an rDNA repeat unit. RDN37 represents the primary transcript that is processed into the 25S, 18S and 5.8S rRNAs. RDN25, RDN18, and RDN58 represent the 25S, 18S, and 5.8S rRNAs, respectively, and RDN5 represents the 5S rRNA. RDN1 is a Locus entry that represents an entire rDNA region, but it does not have associated sequence or position information and does not appear on the map.
SGD does not keep any yeast stocks. Previously, the Yeast Genetic
Stock Center was maintained at Berkeley. This facility has closed,
and the strains have been transferred to the American Type Culture Collection (ATCC).
The following are sources for mutant strains:
SGD does not keep any yeast clones. They may be ordered from the ATCC. Invitrogen sells "GeneStorm Yeast Expressing Clones" containing S. cerevisiae open reading frames in an expression vector (search the website for "yeast clone").
There are a couple of sources for a list of yeast genes with introns:
Unfortunately, we cannot directly help you because SGD is a scientific database that provides information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae to researchers. We are not medical doctors and cannot give medical advice. You should speak to a qualified physician about any medical concerns. To find out more information about pathogenic yeast infections such as Candidiasis, you can go to a medical library at a local university, search the PubMed database for relevant literature, or browse the Candidiasis information at MEDLINE plus.
Our World-Wide Web Virtual Library page has several links to basic information sources for yeast.
The folks at Lesaffre and Red Star have a very nice web site with a lot of basic information about yeast. You can send away for a free handbook with information about using yeast in cooking, as well as some ideas for experiments to see how yeast grows.
The nutritional components of yeast can be found at Lesaffre.
SGD now has a page of Yeast Images as part of the WWW Virtual Library: Yeast. You are welcome to use these.
Last update: 2004-03-18 SRE
Return to SGD |
Send a Message to the SGD Curators ![]() |