The Protein Information page provides information on all protein-coding
ORFs in the Saccharomyces cerevisiae genome. This page contains locus specific nomenclature
and gene product information, a brief description of the role of the gene product within the cell, the predicted
primary sequence, basic information derived from the sequence and a proteome viewer for enhanced visualization
of sequence specific features. Summary links provide access to detailed prediction-based and manually
curated referenced information, as well as links to various external resources. Subtabs, located
just below the main Protein tab at the top of the page, provide access to more detailed
domain/motif
information and
physico-chemical properties. What follows is a description of the informational content presented on
the Protein Information page.
The Basic Protein Information section contains several fields of information relevant to the protein of interest.
Protein nomenclature is listed first in this section and for named genes includes the following:
- Standard Name:
reflects the standard locus name given to a
gene by members of the scientific community, based on the SGD
gene-naming guidelines in the following format: relevant gene symbol, non-italic, initial letter uppercase, with the suffix
'p' appended
- Systematic Name:
reflects the location-based systematic name given to the ORF
during the genome sequencing project in the following format: relevant gene symbol, non-italic, initial letter uppercase, with the
suffix 'p' appended
- Alias Name:
reflects the alias name given to a gene published under multiple
names using the following format: relevant gene symbol, non-italic, initial letter uppercase, with the suffix 'p' appended
- Reserved Name:
reflects the soon to be published reserved gene name
registered with SGD in accordance with the gene-naming guidelines
in the following format: relevant gene symbol, non-italic, initial letter uppercase, with the suffix 'p' appended.
Protein nomenclature, is followed on the page by some basic information including several descriptive fields and some basic protein information including:
- ORF classification:
a designation based on the feature qualifier (verified,
uncharacterized, or
dubious), that indicates the current degree of certainty that an
ORF encodes a functional gene product
- Description:
a concise summary of the biological role and molecular function of the protein and/or gene
- Name Description:
contains the expanded form of the standard name, as described in the literature
- Gene Product: provides a description of the specific function(s) of the protein within the cell, using a controlled vocabulary
- Experimental Data: currently contains the number of molecules/cell, calculated using GFP fusion proteins and quantitative
western blot analysis by
Ghaemmaghami et al. (2003),
as displayed on the yeast GFP fusion localization
database website.
- Predicted Sequence: contains links to the GCG formatted amino acid sequence displayed lower on the page and a
button to download the sequence in FASTA format
- Length (a.a.): the predicted full length of the translated gene product, calculated using GCG's
PEPTIDESORT
- Molecular Weight (Da): the predicted molecular weight of the full length protein in daltons (Da), calculated using GCG's
PEPTIDESORT
- Isoelectric Point (pI): the theoretical isoelectric point (pI) is the pH at which the protein carries no net charge, calculated using
GCG's
PEPTIDESORT
To aid in the visualization of primary sequence-based protein information, an interactive Proteome Browser has been developed. The graphical image on the Protein Information
page (see figure below) is a thumbnail image from the Proteome Browser. Clicking on the thumbnail provides provides access to the interactive
Proteome Browser. This browser is a customized version of GBrowse, a genome browser developed by the
Generic Model Organism Database (GMOD)
project. The Proteome Browser consolidates the display of domains/motifs (predicted by software and datasets assembled by the
InterPro database, using
InterProScan), transmembrane domains
(predicted using TMHMM), signal
peptides (identified using SignalP),
profile hits (using BlastProDom
and ProfileScan, methods
based on the generation of profiles from a family of related sequences derived through multiple sequence alignments), and Kyte-Doolittle hydropathy plots.
In both the thumbnail and the interactive Proteome Browser, HMM domains have been color coded based on the source of the prediction, with
PIR SUPERFAMILY domains in red,
PFAM domains in orange
and yellow, GENE3D domains
in purple, PANTHER domains in green,
TIGRFAM domains in blue and
SMART domains in brown. In the Proteome Browser, a
mouseover feature has been added to provide additional detailed information regarding the feature of interest. For example, mousing over a domain will provide details
concerning the database origin of the domain match, the name and description of the domain, as well as the E-value of the match.
To view a different protein, first click on the thumbnail image to open the Proteome Browser. Then enter the name in the landmark or region text box. The scroll/zoom
feature can be used to modify the region of the protein shown in the default view. The default setting displays the predicted full-length protein, and the zoom option can
be used to look at a particular region in more detail (zooming in). Note that one cannot zoom out. Tracks shown on the default view can be modified by selecting/deselecting the tracks
of interest and then updating the image. User defined tracks of information can also be displayed by simply uploading the file of interest. Additional information concerning the
functionality of the proteome browser can be obtained in the general GBrowse help document since the
underlying code and functionality of the two viewers are the same.
This section of the page contains summary statements relevent to the information type listed, as well as links to internal resources relevent to the
protein of interest. Summary statements listed in this section include:
- Domains/Motifs: contains a link to a table that summarizes information about domains/motifs that are located in the query protein and shared with
other yeast proteins. The table also contains a list of domains/motifs contained in the InterPro database for these other S. cerevisiae proteins,
but not found in the original query protein sequence. See the Domains/Motifs help page
for more details.
- Transmembrane Domains: provides a summary statement regarding the number of transmembrane domains
predicted for the query protein. The TMHMM software uses a hidden Markov model (HMM) to model and predict the location and orientation of transmembrane
domains in the query protein. See the Domains/Motifs help page for more details.
- Signal Peptides: provides a summary statement regarding the number of signal peptides predicted for the query protein based on
the signal sequences identified by the SignalP
software. SignalP uses neural networks and hidden Markov models (HMM) to model and predict signal peptides. See the Domains/Motifs
help page for more details.
- Physical Interactions: provides a summary link to the complete list of curated physical interactions between the query protein and other yeast proteins,
organized according to the technique used to identify them. Each curated physical interaction, includes information on which protein was used as bait and which
was the hit, as well as the source of data, the interaction type (type of experiment used to identify the interaction) and the associated reference. Note: this number
reflects the total number of reported interactions, which may differ by only the technique used to identify the interaction or the published work from which they were
extracted. See the Physical and Genetic Interactions help page for more details.
- Homologs: provides links to several internal resources that can be used to identify proteins with sequence similarity to the query protein.
This includes the following tools:
- PDB homologs presents information on proteins of known structure with sequence similarity
to the query protein.
- BLASTP uses the Basic Local Alignment Search Tool to compare the amino acid sequence of the query
protein against S. cerevisiae sequence datasets.
- PSI-BLAST displays the results of an iterative search that compares the
UniRef90 protein
dataset. As sequence hits accumulate in each iteration, the query is reconstructed using all sequences identified in that round. This iterative search is
therefore very good at identifying broadly related protein families.
- Best Hits compares the results of an NCBI BLASP analysis using the amino acid sequence of
the query protein against the predicted proteins sequences of several model organisms.
- Fungal Alignment displays the alignments between the amino acid sequence of the query protein
and the sequences of orthologs from several closely related, sensu stricto and sensu lato species of Saccharomyces.
- Synteny Viewer displays the degrees of synteny shared among chromosomes in closely related Saccharomyces
species (S. paradoxus, S. mikatae and S. bayanus).
- BLASTP v. fungi (fungal BLAST search) uses the Basic Local Alignment Search Tool to compare the
amino acid sequence of the query
protein against multiple fungal protein sequence datasets.
- Secondary Structure Prediction: provides several links to several internal resources that can be used to identify proteins
with sequence similarity to the query protein.
- GCG:PepPlot
PepPlot shows several common measures of protein secondary structure together on one coordinated plot. Most of the curves are the average, sum, or
product of some residue-specific attribute within a window. In a few cases, the attribute is both specific to the residue and dependent on its
position in the window. Throughout the plot, the blue curves are for beta-sheets and the red curves are for alpha-helices; black is used for turns and
hydropathy. For further details, please see the GCG manual for
PepPlot.
- GCG:Peptide Structure
PeptideStructure makes predictions of the following features of an amino acid sequence:
- Secondary structure according to the Chou-Fasman method
- Secondary structure according to the Garnier-Osguthorpe-Robson method
- Hydrophilicity according to either the Kyte-Doolittle or Hopp-Woods method
- Surface probability according to the Emini method
- Flexibility according to the Karplus-Schulz method
- Glycosylation sites
- Antigenic index according to the Jameson-Wolf method
For further details, please see the GCG manual for
Peptide Structure.
- GCG:Helical Wheel
HelicalWheel plots a helical wheel representation of a peptide sequence. Each residue is offset from the preceding one by 100 degrees, the typical angle of rotation for
an alpha-helix. For further details, please see the GCG manual for
HelicalWheel.
This section provides access to a number of external resources relevant to the query protein. This includes sequence entries located at various external sequence databases,
homolog related resources, interaction databases, protein databases, and localization resources. These will be described briefly in order below.
- External Sequence Databases
These links provide access to a compendium of Saccharomyces cerevisiae sequence entries for alleles and strains that are located in various external databases and includes
GenBank, EMBL.DDBJ, NCBI, EBI, UniProt/Swiss-Prot and MIPS. Sequence entries are listed by accession and/or version numbers according to the source. Addition information
is available in the All Associated Sequences help page.
- External Classifications
This section lists assignments to the Enzyme Commission (EC)
and/or the Transporter Classification (TC)
numbers. EC assignments were made by UniProtKB/Swiss-Prot curators,
while TC assignments were made as part of the Yeast Transporter
Information (YETI)
project at Genolevures
(De
Hertogh B, et al. (2006) Genetics 172(2):771-81).
- Homologs
These links provide access to several sources of homolog information, when available for the requested protein.
- BLASTP (NCBI): a direct link to BLASTP at NCBI, facilitates the comparison of the amino acid sequence of the query protein to all non-redundant GenBank
CDS translations + PDB + SwissProt + PIR + PRF
excluding environmental samples
- YOGY: the eukarYotic OrtholoGY (YOGY) tool is used to view orthologous proteins from eukaryotic organisms (Homo sapiens,
Mus musculus, Rattus norvegicus, Arabidopsis thaliana, Drosophila melanogaster, Caenorhabditis elegans, Plasmodium falciparum, Schizosaccharomyces pombe, and
Saccharomyces cerevisiae). YOGY provides information from KOGs, Inparanoid, Homologene, OrthoMCL, and manually curated orthologs between S. cerevisiae
and S. pombe. YOGY was developed by the
Fission Yeast Functional Genomics Team
at the Wellcome Trust Sanger Institute, Cambridge, UK.
- YGOB: a tool used to visualize the syntenic context of protein coding genes from S. cerevisiae, S. castellii, C. glabrata, A. gossypii,
K. lactis, K. waltii, and S. kluyveri. This tool was developed by Kevin Byrne and Ken Wolfe (Trinity College, Dublin, Ireland), as described in Byrne and Wolfe.
- Ashbya (AGD): provides a direct link between the S. cerevisiae protein and the Ashbya gosspyii
ortholog at the Ashbya Genome Database
(AGD) located at the University of Basel
- Candida (CGD): provides a direct link between the S. cerevisiae protein and the Candida albicans
ortholog at the Candida Genome Database
(CGD) located at Stanford University
- Candida (CandidaDB): provides a direct link between the S. cerevisiae protein and homolog from the fungal pathogen Candida albicans at CandidaDB,
located at the Institut Pasteur
- Interaction Resources
The links provides access to several external databases containing both genetic and physical interaction data including: the
Yeast Resource Center Informatics Platform at
the University of Washington, the
Biomolecular Interaction Database (BIND) at the University of Toronto, the
Database of Interacting Proteins (DIP) at UCLA, the
General Repository of Genetic and Physical Interactions (GRID) at the University of Toronto, and the
PortalPath Calling Yeast Interaction Database, at CuraGen Corporation.
- Protein databases/Other
These links provide access to information on structural assignments to protein sequences at the superfamily level
using the SCOP superfamily, Enzyme Commission
designations for enzymes generated by entering the NiceZyme Enzyme Commission (E.C.) number into the search program at NiceZyme, information on the protein from the
Munich Information Center (MIPS) generated by entering the ORF name into MIPS database search program, and links to mass spec. data at
GPM DB
- Localization Resources
These links provide access to several external databases that contain localization data for many yeast proteins including: the
Yeast GFP Localization Database
at UCSF, the YGAC Triples Database at Yale, and the
Yeast Protein Localization Database at University Graz
and the Yeast Resource Center at the University of Washington.