SGD Help: Gene Ontology (GO)
The Gene Ontology (GO) project was established to provide a common language to describe aspects of a gene product's biology. The use of a consistent vocabulary allows genes from different species to be compared based on their GO annotations.The Gene Ontology (GO) project started as a collaboration between three model organism databases, the Saccharomyces Genome Database (SGD), FlyBase (for Drosophila), and Mouse Genome Informatics (MGI). The GO Consortium has expanded considerably to include many additional model organism databases and annotation groups, each of which contributes to the development of the ontologies, generation of GO annotation files, or development of software tools to utilize GO depending on the nature of its affiliation.
Within SGD, GO annotations are used to describe what gene products do and where they are located. Thus GO annotations appear directly on the Locus Summary pages for both protein coding and non-coding RNA genes. More detail about the GO annotations or the GO terms are located on additional pages. GO tools such as the GO Term Finder and the GO Slim Mapper utilize the GO annotations to analyze sets of genes and identify common functions, processes, or locations.
- What is GO?
- What is a GO Annotation?
- Annotation Methods
- Accessing GO Annotations in SGD
- Accessing the AmiGO Browser
The objective of GO is to provide controlled vocabularies for the description of the molecular function, biological process, and cellular component of gene products. The name and definition for each GO term and the parent-child relationships between terms are defined by the members of the GO Consortium. This combination of a controlled vocabulary of defined terms with a structure of relationships between items is referred to as an ontology. See the GO Consortium's An Introduction to the Gene Ontology for a basic introduction to the Gene Ontologies.
This diagram shows a small portion of the Biological Process ontology. Terms at the top represent broader, more general concepts, while terms lower down represent more specific concepts. When referring to the structure between terms, a term that has terms below it is referred to as a parent term, while those terms below it are referred to as child terms. Note that each term will be a parent with respect to the terms below it and a child with respect to terms above it. There are two different relationship types between terms. While not shown in SGD, you may notice that the relationship types are shown in the browser. Note that the Gene Ontologies themselves contain only information about terms in the ontology and their relationships to other terms. They do not contain gene products of any specific organism.
To provide specific information about gene products, a GO term, e.g. cytokinesis, is associated with a gene or gene product, e.g. ACT1 or Act1p to form a GO Annotation. In addition to the association between a gene product and a GO term, a GO annotation must also be associated with a specific reference, an evidence code, and the date on which the annotation was made. Thus a basic GO annotation includes these pieces of information:
|gene (or gene product)||e.g., ACT1|
|GO term||e.g., cytokinesis|
|reference||The reference contains data or statements which support the annotation, or a description of the method by which the annotation was assigned.|
|Evidence Code||the Evidence Code gives a basic indication of the type of data or statement that supports the annotation. More information about the GO Evidence Codes is found in the Guide to GO Evidence Codes.|
|date||The date on which the annotation was assigned or reviewed.|
These basic, essential parts of a GO annotation are all displayed on SGD's GO Evidence and References pages; see, for example, this one for the ACT1 GO Annotations.
This diagram shows a portion of the GO Biological Process ontology along with the GO Biological Process annotations of the genes BUD3, BUD4, and AXL2. As demonstrated by the annotations for BUD3, genes can be annotated with multiple GO terms which are at various levels within the ontology, depending on the experiments and type of evidence available to annotate each gene.
In addition to the basic, essential components of a GO annotation, there are some optional pieces of information that may be associated with the GO annotation when appropriate. These include a qualifier and the with field:
|Qualifiers||There are several Qualifiers that modify the interpretation of an annotation. The three allowed qualifiers are currently NOT, contributes_to, and colocalizes_with. For a detailed explanation of the qualifiers, please see the Using the Qualifier column section of the GO Annotation Conventions guidelines.|
|With/From Field||For some evidence codes, it is useful to specify a second object that the gene being annotated interacted with or was compared to. For example, for Inferred from Genetic Interaction (IGI), it is useful to specify which other genes were involved in a genetic interaction with the gene being annotated. Similarly for Inferred from Physical Interaction (IPI), the "with field" specifies the gene products with which the gene product being annotated interacted. When used for Inferred from Sequence or Structural Similarity (ISS), the "with field" indicates what the gene being annotated was compared to in a sequence-based analysis. For the evidence code Inferred by Curator (IC), this column contains the GOID of the GO term(s) used as the basis of the curator's inference.|
The use of GO terms to annotate gene products in many databases facilitates uniform queries across multiple species.
Using both the Gene Ontologies and GO annotations, tools can be built which allow the display of gene products annotated to GO terms to be displayed alongside the GO terms themselves or to find gene products that are involved in similar biological processes, similar molecular functions or similar cellular components. The GO Consortium develops and maintains the browser to provide a means to make queries about either GO terms or the gene products annotated to them.
To differentiate annotations made from published small scale experiments, genome-wide or high-throughput experiments and computational predictions, we have separated GO annotations at SGD into three sets:
|Manually curated GO annotations||Manually curated GO annotations reflect our best understanding of the basic molecular function, biological process, and cellular component for a gene product. Manually curated annotations are assigned by SGD curators reading the literature for each gene and making annotations from published papers when available. When published literature is available, such annotations may include those based on experiments, sequence similarity, or other computational analyses described in the paper, or on statements made by the authors. Curators periodically review all Manually curated GO annotations for accuracy and completeness and update as necessary, adding new annotations to reflect advances in knowledge and removing any annotations that are no longer supported by the literature. The Last Reviewed on: date on the GO evidence and references page for a gene indicates the date when an SGD curator reviewed all of the Manually curated GO annotations for that gene. In addition, SGD also reviews and incorporates manual GO annotations for S. cerevisiae proteins from the GO Annotation (GOA) project at Uniprot. These annotations can be identified at SGD by the source, e.g., 'Uniprot', 'MGI', 'HGNC' (GO consortium members), displayed on the 'Assigned By' column of the GO evidence and references page.|
|High-throughput GO Annotations||GO annotations from high-throughput experiments are assigned based on a variety of large scale high-throughput experiments, including genome-wide experiments. Many of these annotations are made based on GO annotations (or mappings to GO annotations) assigned by the authors, rather than SGD curators. While SGD curators read these publications and often work closely with authors to incorporate the information, each individual annotation is not necessarily reviewed by a curator. GO Annotations from high-throughput experiments will be assigned only when this type of data is available, and thus may not be assigned in all three aspects of the Gene Ontologies.|
|Computational GO Annotations||Computational GO annotations are made by a variety of computational methods, such as sequence similarity methods, including protein domain motifs, and keyword mapping files. When annotations based on computational methods are NOT reviewed by a curator, they are placed in the Computational GO annotations section. Note that when annotations supported by a computational method, such as sequence analysis, are reviewed by a curator, they may be found in the Manually curated section.|
At SGD, curators read the research literature and associate specific GO terms with the appropriate gene products to provide information about the state of knowledge of the yeast genome. We are constantly updating our GO annotations and always welcome suggestions for improvement or corrections when the understanding about a gene has changed since the last time we reviewed the literature for a given gene.
Users can search for GO terms in any of the three Gene Ontologies that match a text query, e.g. "bud", using the Search box located at the top of SGD pages. The search result is a list of matches for the query term. Clicking on the "Gene product activities (GO Molecular Function)", "Cellular roles or processes (GO Biological Process)", or "Protein complexes and locations (GO Cellular Component)" links from the results page will provide lists of GO terms containing the query string.
Users can search for GO terms whose GOIDs (minus the "GO:" prefix and leading zeroes) match a purely numerical query, e.g. "5685", using the Search box located at the top of SGD pages. The search result will usually be the GO term whose GOID matches the query. Occasionally, the search result will be a list of matches for the query term, where clicking on the Gene Ontology ID link will take you to the associated GO Term page.
At SGD, you can find GO annotations displayed at various levels of detail in three locations as described below.
|Locus Summary page||Each Locus Summary page, like this one for RCL1, lists the GO terms, with associated evidence codes, that SGD curators have used to annotate the gene of interest. From the Locus Summary page, clicking on the GO evidence and references link takes you to the GO Annotations page for that gene, while clicking on a GO Term name will take you to the corresponding GO Term page.|
|GO Annotations page||This page, for example the RCL1 GO Annotations page, lists all the GO terms that have been used to annotate the particular gene, along with the specific reference(s) used to make each annotation and the evidence code(s) describing the type of evidence or statement found in that reference. There are two main sections, the first for Manually curated GO Annotations, the second for GO Annotations from high-throughput experiments. Within each section, annotations from each of the three aspects of the Gene Ontology, Molecular Function, Biological Process, and Cellular Component, are found in individual sections.|
|GO Term page||Clicking on a term name, from either of the pages described above, takes you to the GO Term page for that term, for example this one for rRNA processing. The GO Term page provides specific information about the GO term, listing any synonyms or alternative phrases for the term name, the definition for the term, the aspect of the gene ontology (biological process, molecular function, or cellular component) to which it belongs along with its GOID number (a unique numerical identifier), and a graphical view showing the relationship between this term and others in the ontology. Annotations of genes within SGD are summarized in a table, followed by a complete listing of all genes in SGD that have been annotated to the term, along with the relevant reference and evidence code for each annotation. The GO term page also provides access to the AmiGO browser via the icon link.|
This tool identifies the major branches of the ontologies common to a list of genes or ORFs, based on their GO annotations. The GO terms that represent the major branches of the ontology are higher level terms, also known as the GO slim terms. This is possible with GO because there are parent-child relationships recorded between the granular terms and the high level GO slim terms. For more information on this tool, please click here.
This tool searches for significant shared GO terms or parents of GO terms used to describe your set of genes or ORFs. This tool helps you understand what is common among the genes/ORFs you are studying. Results from this search are displayed in a graphic and table form. The graphic view shows the parent-child relationships (DAG view) of the GO terms that are used to annotate the genes/ORFs. For more information on this tool, please click here.
SPELL (Serial Pattern of Expression Levels Locator) is an analysis tool for microarray data that facilitates the rapid identification of the most informative datasets and co-expressed genes based on patterns of expression shared with a query gene or genes. Search results also display GO term enrichment for the genes of interest and other genes that have similar expression patterns. This helps identify relationships between a large number of genes with similar expression profiles.
In SGD, The AmiGO browser is accessible from all GO Term pages, where the icon link will take you directly to the corresponding AmiGO page for that GO term. AmiGO allows you to find genes from other organisms, as well as those from S. cerevisiae, which have been annotated to a specific GO term. On the AmiGO page for a GO term, you can view a clickable tree (DAG) view of the GO term and a list of all genes that have been annotated to the term, either directly or to any of its child terms.
Go to GO Term page search