SGD Help: Locus Page
Contents
The locus page represents the focal point for information about all
chromosomal features in the Saccharomyces cerevisiae genome. This
includes information about the gene, and the encoded RNA and protein,
as well as structural features. This information is constantly being
updated as new literature, datasets and external links become available.
Efforts are also being made to reference all information presented
on the locus page. What follows is a description of the informational
content presented on the locus page.
The SGD locus page is divided into three sections: BASIC INFORMATION,
RESOURCES and ADDITIONAL INFORMATION that will be described in
order below. At the request of the S. cerevisiae community,
we have also created an Alternative Single Page Format for the Locus Page. The
Single Page Format presents the same information available from the Locus Page,
but arranges it sequentially on one long page. This format is accessible by
clicking on the "Alternative single page format" link located at the top of
all locus pages.
The Basic Information section lists the
standard locus name, the
systematic name,
aliases, retired
gene names, and also indicates whether a name is standard
or reserved.
It addition, it includes the following information:
- Feature Type: indicates what feature type resides on this
chromosomal sequence. Feature types currently include:
ORF,
rRNA,
snRNA,
snoRNA,
tRNA,
ncRNA, telomeres and associated subfeatures
(telomeric repeat,
X element combinatorial repeats,
X element core sequence,
Y' element), centromere,
ARS elements,
retrotransposon,
LTR, transposable
element gene (TyORF), pseudogene, as well as features that are "not in the
systematic sequence of S288C" and those that are "not physically mapped".
ORF's receive additional designations indicating the degree of certainty that
they actually encode proteins. ORF's are classified as
"Dubious", "Uncharacterized, or "Verified.
- Description: contains a concise summary or
description
of the function and biological role of each
gene
product.
- Gene Ontology (GO) annotations:
The
Gene Ontology (GO) Annotations
(Molecular Function,
Biological Process, and
Cellular Component), describe a gene's
molecular function(s), its broad role in biological processes, and its presence in
subcellular locations, structures or macromolecular complexes. Each
annotation links to a page showing all yeast genes annotated to that
term. The GO annotations use a controlled vocabulary that allows
powerful searches within SGD and across other databases. For more
information about GO at SGD, please see the
SGD help page for GO.
- Pathways: lists all the metabolic pathways in which the
gene product functions. These pathway displays were created using the
Pathway tools software developed by Peter Karp and his associates at
SRI. The
Main Query Page offers various options to look at all Yeast
Biochemical Pathways and the
SGD help page on pathways provides more details on how to use and
navigate the information available in the Yeast Biochemical Pathways.
- Name Description: contains the expanded form of the standard
name, as described in the literature.
- Gene Product: a description of the protein or RNA product that
is coded for by the gene.
- Regulatory Role: lists DNA binding motifs for the
encoded protein. Details and associated references for the binding
motifs are available via a link from this section. Also provides links
to predicted regulatory modules in which the protein may be involved
[as predicted by Segal
et al. (2003)].
- Mutant Phenotype: lists mutant
phenotype(s) for the gene, including those reported by the
systematic deletion project consortium. Details and associated references are
available via a link located at the top of this section.
- Interactions: provides a link to a list of physical and
genetic interactions identified between the encoded protein or gene
and other proteins or genes, including those curated at other
databases (GRID, BIND and MIPS).
- Physical Interactions: provides a link to a list of
physical interactions identified between the encoded protein and
others, including those curated at other databases (GRID, BIND and
MIPS). Within this section, physical interactions have been organized
according to the technique used to identify them.
- Genetic Interactions: provides a link to a list of genetic
interactions identified between this gene and other genes, including
those curated at other databases (GRID, BIND and MIPS). Within this
section genetic interactions have been organized according to the
interaction type.
- Sequence Information: lists the chromosomal coordinates of
the S288C derived gene, feature or subfeature, as well as its genetic
position, when applicable. There are links to two genome browsers: the
Chromosomal Features Map, and
GBrowse. The dates on which the S288C derived sequence and the
coordinates last changed are listed in the Last Update section in the
following format: year-month-date. There are two types of dates in the
"Last Update" section:
- Coordinates: indicates the date when the
chromosomal coordinates of the feature were last changed. In
most cases this is likely due to an insertion or deletion to the left
of the feature, resulting in a shift of all chromosomal coordinates
for features located to the right of the insertion or deletion.
- Sequence: indicates the date when the sequence of
the feature was last changed. This can be due to a sequence change
within the feature, a change in the intron/exon structure of the
feature or an extension or deletion of the feature at either the 5' or
3' end. At the present time, the oldest date displayed is 2000-05-19.
The relative coordinates and date of the most recent update for associated
subfeatures are listed in the Subfeatures Details section. Finally, this
section also provides links to the S288C derived sequence of the Genomic
DNA, the Coding Sequence and the ORF Translation.
- External Links: provides links to other information sources
for the gene. More information on the databases can be found in SGD's
glossary.
The external links were generated using the following methods:
- All Associated Sequences: directs you to the
All Associated Sequences page which lists sequences (includes the
S288C reference sequence and sequences from other strains)
found in various external databases, including the GenBank/DDBJ/EMBL
nucleotide and protein databases. See the All Associated Sequences
help page for more information.
- E.C.: the NiceZyme Enzyme Commission (E.C.) number is
plugged into the search program at NiceZyme.
- Entrez Gene: Gene IDs are first downloaded from NCBI RefSeq.
Gene name data are provided to NCBI RefSeq by SGD. The gene names and
corresponding Gene IDs are parsed out of the RefSeq chromosome flat files.
The links are generated by plugging in the NCBI Gene IDs, downloaded from
NCBI RefSeq, into the NCBI
Entrez Gene's search program.
- Entrez RefSeq Protein: RefSeq Protein Version IDs are downloaded
from NCBI RefSeq. Gene name data are provided to NCBI RefSeq by SGD. The gene
names and corresponding RefSeq Protein Version IDs are parsed out of the
RefSeq chromosome flat files. The links are generated by plugging in the
RefSeq Protein Version ID into the NCBI Entrez RefSeq search program. The files
are located at:
ftp://ftp.ncbi.nih.gov/genomes/Saccharomyces_cerevisiae/
- MIPS: the
ORF name is plugged into the Munich Information Center for
Protein Sequences (MIPS)
database search program.
- SwissProt: SwissProt IDs are linked to SGD ORFs based on
scores obtained through a
BLASTP search against the SGD ORFP dataset. The results of the
BLASTP searches are manipulated in various ways based on the assumption
that there exists a 1:1 relationship between SGD and SwissProt's
S. cerevisiae entries.
The Resources Section provides resources for analysis and additional
information retrieval for a particular gene. The Resources section has
the following information retrieval options, organized into pull-down menus:
- Graphic Genome Viewers: At the top of the Resources Section
are two graphic representations of the genetic features that surround
a particular chromosomal location. Clicking on either the SGD ORF map,
(also known as the Chromosomal Features
Map) or
GBrowse will take you to an expanded view of this region of the
genome so that the feature of interest can be viewed in the context of
its chromosomal surroundings.
- Literature:
- Literature Guide: retrieves SGD's
Literature Guide page, which
organizes papers about a gene by various topics.
- Community Annotation: retrieves research highlights,
as submitted by members of the yeast community. See the
Community Annotation form to submit a research highlight.
- GermOnline: retrieves available information from
GermOnline,
a cross-species community annotation knowledgebase located at the
University of Basel, that provides access to basic locus information,
expression data, protein/proteomic information and access to various
alignment, modelling and cluster analysis tools.
- Search Google Scholar: takes you to the results of a
Google Scholar search where
the Standard or Systematic Name and "cerevisiae" have been used
as keywords.
- Search PubMed Only: searches the abstracts of papers in
PubMed for the co-occurrence
of locus name(s) (or systematic name) and Saccharomyces cerevisiae.
- Search All NCBI (Entrez): takes you to the results of an
NCBI
Entrez search where the Standard or Systematic Name and
"cerevisiae" have been used as keywords in a search of NCBI
databases.
- Retrieve Sequences: retrieves the sequence of the features
Genomic DNA (includes introns) with or without 1kb of upstream
and downstream sequences, as derived from the S288C reference strain,
the Coding Sequence (CDS)
and the ORF Translation (as appropriate). In addition, a link to
All Associated Sequences
provides access to both S288C and non-S288C sequences stored in
various external databases. Finally, a Custom Retrieval option
allows users to perform custom sequence queries using SGD's
Gene/Sequence Resources.
- Sequence Analysis Tools: retrieves both internally processed
BLASTP,
BLASTN,
FASTA aa,
FASTA nt searches and the
results of an externally processed
BLASTP at NCBI. A
Restriction Map of the sequence or just the predicted restriction fragment
sizes, a Web Primer tool used
to design primers for PCR or sequencing, and a Six-Frame
Translation overview with restriction sites can also be retrieved.
- Protein Info & Structure: provides links to basic
Protein Information, as well as
links to mass spec. data at
GPM DB, information on predicted protein Motifs generated using
eMOTIFs developed at
Stanford University, information about proteins of known structure
with sequence similarity to the locus of interest using the SGD
resource, PDB Homologs and
structural assignments to protein sequences at the superfamily level
using the
SCOP superfamily resource, co-developed at Stanford University and
the MRC.
- Localization Resources: provides access to several
external databases that contain localization data for many yeast
proteins including the:
Yeast GFP Localization Database
at UCSF, the
YGAC Triples Database at Yale, and the
Yeast Protein Localization Database at University Graz.
- Interactions: provides access to several external
databases containing genetic and physical interaction data including:
the
Yeast Resource Center Informatics Platform at the University
of Washington, the
Biomolecular Interaction Database (BIND) at the University of
Toronto, the
Database of Interacting Proteins (DIP) at UCLA, the
General Repository of Genetic and Physical Interations (GRID)
at the University of Toronto, and the
PortalPath Calling Yeast Interaction Database, at CuraGen
Corporation.
- Phenotype Resources: provides access to several external
databases containing phenotype data including:
the Profiling of
Phenotypic Characteristics in Yeast (PROPHECY) at Goteburg University,
the Saccharomyces
cerevisiae Morphological Database (SCMD) at the University of
Tokyo, the
Yeast Proteins Functional Assignment Database at UCLA and the the
YGAC Triples Database
at Yale.
- Maps and Displays: retrieves several locus-centered maps or
graphs, including a
Map display of Chromosomal Features, a Gbrowse map showing the
position of both ATCC and WashU clones, a table of flanking features
for viewing and downloading neighboring chromosomal features,
Combined Physical & Genetic Map that
displays a side-by-side representation of the combined physical and
genetic maps, and a graph displaying
Physical/Genetic Map Ratios.
- Comparison Resources: contains links to tools for comparing
DNA and protein sequences as well as chromosomal arrangements between
S. cerevisiae and other species. For more information on the
BLAST programs. Please see NCBI's
BLAST program selection guide for more details.
-
PSI-BLAST Results: retrieves the results of a
Sequence Similarity Query using the protein sequence of the
ORF as the query for
PSI-BLAST analysis against NCBI's non-redundant (nr) protein dataset.
- Ashbya Homologs (AGD): provides a direct link between
the S. cerevisiae locus page and the Ashbya gosspyii
ortholog at the Ashbya Genome database
(AGD) located at the University of Basel.
- BLASTN vs. fungi: facilitates the comparison of the
nucleotide sequence of the gene of interest to fungal nucleotide
sequence datasets gathered from GenBank.
- BLASTP at NCBI: provides a direct link to BLASTP at NCBI, to
facilitate the comparison of the amino acid sequence of the protein
of interest to all non-redundant GenBank CDS translations +
PDB +
SwissProt +
PIR + PRF excluding environmental samples.
- Candida Homologs (CandidaDB): provides a direct link between
the S. cerevisiae gene and the respective gene from the fungal
pathogen Candida albicans identified using
BLAST.
- Fungal Alignment: displays a ClustalW based
Fungal Sequence Alignment of the Saccharomyces cerevisiae
protein with identified orthologs in other fungal species.
- Model Organism BLASTP Best Hits: displays
Model Organism Best Hits resulting from an NCBI BLASTP
analysis using the S. cerevisiae protein sequence of interest
to query predicted protein sequences from several model organisms
that currently include: Saccharomyces cerevisiae, Arabidopsis thaliana, Ashbya gossypii,
Caenorhabditis elegans, Drosophila melanogaster and
Homo sapiens.
- Synteny Viewer: provides a regional map based view of the
synteny
that exists between Saccharomyces cerevisiae and the closely
related yeast species: Saccharomyces paradoxus,
Saccharomyces mikatae, and Saccharomyces bayanus. Please
see the SGD
Synteny Viewer help page for more details.
- Functional Analysis: retrieves a summary of expression data
for the gene of interest from any of several large-scale microarray
experiments using Expression
Connection and in addition accesses three external databases:
-
GermOnline: provides access to basic locus information,
expression data, protein/proteomic information and access to various
alignment, modelling and cluster analysis tools.
-
YGAC Triples: retrieves gene expression data from transposon
insertion experiments, as well as gene descriptions and protein
localization data.
-
Yeast Microarray Global Viewer: displays a graphical
representation of gene expression data from published genome-wide
experiments, as well as other useful resources.
The Additional Information section provides
links to pages and features in SGD that provide further information
about a locus. In addition, when available this section also contains:
Gene Summary
Paragraphs, and a list of references for basic locus page information.
- Summary Paragraph: the
Gene Summary Paragraph provides a summary of published biological
information for a gene and its product that is designed to
familiarize both yeast and non-yeast researchers with the
general facts and important subtleties regarding a locus.
- References Cited On This Page: lists the
references used to curate information displayed in the Standard Name,
Alias, Description, Name Description, Gene Product and the Summary
Paragraph fields. Note that this section is not a comprehensive
listing of publications relevant to this gene. To retrieve a list of
all publications annotated to this gene, select the "View Complete
Literature Guide for [gene name]" link at the top of this section, or
the "Literature" tab at the top of the page.
- Additional Information Links:
-
Locus History: retrieves notes about the locus, which are used
to alert users of: nomenclature conflicts, gene name history, proposed
and completed sequence and annotation changes, mapping data, alternative
processing information and notes concerning repeated loci. For reserved
gene names, the
Locus History includes both the reservation and expiration date.
-
Expression Connection: links to a form that allows you to
search gene expression data fron multiple microarray experiments.
-
Function Junction: links to the Function Junction search form
that allows users to simultaneously search a variety of functional
analysis project sites for all available functional information for a
given gene or ORF (already filled out with the gene (or ORF) name
for the locus).
-
Gene/Sequence Resources: links to a form that allows you
to retrieve a list of options for accessing information available
for 1) a named gene or sequence, 2) a specified chromosomal region,
or 3) a raw DNA or protein sequence. This information includes
biological information, table/map displays, and sequence analysis
and retrieval options (already filled out with the gene (or ORF) name
when accessed via the locus page).
-
Global Gene Hunter: links to a search form that allows the user
to simultaneously search one or more of six different databases for a
given gene name. The user selects the set of databases to be
queried allowing the rapid retrieval of information contained in
specific databases for a given gene.
-
Mapping Data: retrieves a table summarizing the available
two-point genetic mapping data for that gene in SGD.
-
Motifs: uses the eMOTIF search to query the eMOTIF database to
retrieve shared motif information about the query protein.
-
PDB Homologs: retrieves information about proteins in the PDB
database with known structural and sequence similarity to that of
the query protein.
-
Protein Info: retrieves basic information about the gene product including the sequence, predicted cleavage sites, predicted mass and pI, amino acid composition, as well as links to other protein resources.
-
Researchers: retrieves the
Colleague entry for researcher(s) that work on that gene; if
there is more than one contact person for a gene, this link retrieves
a list of Colleague entries.