SGD Glossary Terms
A-B | C-E | F-J | K-M | N-O | P-Q | R-S | T-Z
- 2_point_data
- This refers to data generated by tetrad analysis of a cross in
which the segregation of 2 genetic markers is followed. These data
yield the distance between the 2 markers (usually mutant alleles of
genes) on the genetic map.
- 5' UTR intron
- An intron located in the 5' prime UTR (SO:0000447).
- Accession number
- This refers to the unique GenBank identifier a sequence has been
assigned. This number can be used to search Genbank records for a
specific sequence.
- AceDB
- AceDB was the database software used by SGD. However, currently
SGD has moved over to an ORACLE relational database and is no longer
using AceDB. For more information on AceDB, please click here.
- Affinity capture-MS
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, an interaction is
inferred when a "bait" protein is affinity captured from cell extracts by either
polyclonal antibody or epitope tag and the associated interaction
partner is identified by mass spectrometric methods.
- Affinity capture-RNA
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, an interaction is
inferred when a "bait" protein is affinity captured from cell extracts
by either polyclonal antibody or epitope tag and the associated
interaction partner is identified by specific RNA binding.
- Affinity
capture-Western
- This term is used to identify and describe
interaction data displayed at SGD. In this type of experiment, an
interaction is inferred when a "bait" protein is affinity captured
from cell extracts by either polyclonal antibody or epitope tag and
the associated interaction partner is identified by Western blotting
with a specific polyclonal antibody or second epitope tag.
- Affinity Chromatography
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, an interaction is
detected by chromatographic purification (for example, GST fusions
purified with glutathione-Sepharose beads).
- Affinity Precipitation
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, a "bait" protein is affinity
captured from cell extracts by either polyclonal antibody or epitope
tag and the associated interaction partner is identified by immunoblot
with a specific polyclonal antibody or second epitope tag. This
category is also used if an interacting protein is visualized
directly by dye stain or radioactivity.
- Alias
- 'Alias' refers to a non-standard name for a locus. When multiple
names are published for a locus, one name is designated the standard
name (following the Gene naming guidelines) and the other published names are retained under
'Alias'. If a name has been reserved for a gene for a significant
period of time, it will also be retained as an alias even if it has
not been published.
- Alignment
- A presentation of two compared sequences that show
the regions of greatest statistical similarity.
- Annotation
- At SGD, annotation refers to information that has been extracted
from the literature and associated, on the database pages, with
various aspects of an S. cerevisiae gene or chromosomal feature. SGD makes several
types of annotations, such as GO, Sequence, and Literature Guide annotations.
- Anonymous FTP
- A method of sharing files on the Internet. A variety of software
that can provide FTP function is available in most networking software
packages. Anonymous FTP simply means a computer will allow anyone
using the FTP software access to a special directory fo files on its
disk drive. This service is called Anonymous FTP because the user
name used is "anonymous." When asked for a password, simply enter
your e-mail address. A stand-alone version of SacchDB can be
transferred using Anonymous FTP.
- AmiGO
- A web application developed by the
Gene
Ontology (GO) Consortium that can be used to search, browse and
visualize Gene Ontology data. AmiGO displays detailed information
related to GO terms and the gene products annotated to those
terms. Using AmiGO, it is possible to access GO annotations for the
many different species for which GO annotations have been submitted to
the GO Consortium.
- Aromaticity score (Aromo)
- This index is the frequency of aromatic amino acids (Phe, Tyr,
Trp) in the hypothetical translated gene product. The hydropathicity
and aromaticity protein scores are indices of amino acid usage. The
strongest trend in the variation in the amino acid composition of
E. coli genes is correlated with protein hydropathicity, the second
trend is correlated with gene expression, while the third is
correlated with aromaticity (
Lobry and Gautier 1994). The variation in amino acid composition can
have applications for the analysis of codon usage. If total codon
usage is analyzed, a component of the variation will be due to
differences in the amino acid composition of genes.
- ARS Consensus Sequence (ACS)
- The ACS is an 11-bp sequence of the form 5'-WTTTAYRTTTW-3' which is at the core of every yeast ARS, and is necessary but not sufficient for recognition and binding by the origin recognition complex (ORC). Functional ARSs require an ACS, as well as other cis elements in the 5' (C domain) and 3' (B domain) flanking sequences of the ACS.
- Associate
- In Colleague information, "Associate" refers to coworkers or collaborators.
- ATCC
- American Type Culture Collection;
maintains collections of yeast strains and clones.
- Author
- An author of a paper or personal communication included in SGD. When
searching for an individual's name, use the "*" wildcard character
(i.e., Johnson*) to achieve the best results
- [Search Authors Help]
- Autonomously Replicating Sequence (ARS)
- A DNA sequence element occurring on average every 40 kb in yeast and originally defined by its ability to confer replication on extrachromosomal circular DNA molecules. ARS elements correspond to chromosomal origins of replication (ORIs), tend to be A/T rich, and have been implicated in the binding of the primosome complex.
- Binding site
- Consensus sequence to which a specific molecule binds.
- Biochemical Activity
- This
term is used to identify and describe interaction data displayed at
SGD. In this type of experiment, an interaction is inferred from the
biochemical effect of one protein upon another, for example, GTP-GDP
exchange activity or phosphorylation of a substrate by a kinase.
- BioGRID
- BioGRID (General Repository for Interactions database) is a database of genetic and physical interactions developed by
The Tyers Group at Mount Sinai Hospital, Toronto, Canada. It contains interaction data from many sources including several genome/proteome-wide studies, the MIPS database and BIND.
- Biological process
- One of the three categories used by the Gene Ontology project,
biological process describes broad biological goals, such as
mitosis or purine metabolism.
- BioSci
- BIOSCI is a set of internet
newsgroups and e-mail lists for biologists. SGD maintains an archive
of the yeast BIOSCI
archives.
- BLAST
- Basic Local Alignment Search Tool is a search algorithm developed by
Altschul et al. (1990). It is a very fast search algorithm that is
used by the blastn, blastp, and blastx programs to separately search
protein or DNA databases. BLAST
is best used for sequence similarity searching, rather than for motif
searching.
- blastn
- A BLAST
program that compares a nucleotide query sequence against a nucleotide
sequence database. The user must enter a NUCLEOTIDE sequence and
select a DNA database (genoSc or GenBank) to search.
- blastp
- A BLAST
program that compares an amino acid query sequence against a protein
sequence database. The user must submit an AMINO ACID sequence and
select a PROTEIN database (NRSC) for the search.
- blastx
- A BLAST
program that compares the six-frame conceptual translation products of
a nucleotide query sequence (both strands) against a protein sequence
database. The user must enter a NUCLEOTIDE sequence and select a
PROTEIN database (NRSC) for the search.
- BLOSUM100
- An alternative scoring matrix for BLAST searches.
- BLOSUM30
- An alternative scoring matrix for BLAST searches.
- BLOSUM50
- A scoring matrix that is used as the default in FASTA searches.
- BLOSUM62
- A scoring matrix that is used as the default in blastp, blastx, and tblastn BLAST searches.
- CDS
- CoDing Sequence, region of nucleotides that
corresponds to the sequence of amino acids in the predicted protein. The CDS includes start and stop codons, therefore coding sequences begin with an "ATG" and end with a stop codon.
In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR,
introns, or bases not expressed due to frameshifting, are not included
within a CDS. Note that the CDS does not correspond to the actual mRNA sequence.
- centiMorgan
- The unit of linkage that refers to the distance between two gene
loci determined by the frequency with which recombination occurs
between them. Two loci are said to be one centiMorgan apart if recombination is observed between them in 1% of
meioses. (from the Genetics Home
Reference at the NIH. In yeast, recombination frequency is assayed by tetrad
analysis. A centiMorgan is equivalent to a map unit (m.u.). The
centiMorgan is named after the geneticist Thomas
Hunt Morgan.
- Centromere
- This term refers to the portion of a chromosome where the
kinetochore assembles. The kinetochore attaches chromosomes to mitotic
and meiotic spindles. Thus, the centromere is critical for the proper segregation of chromosomes during mitosis
and meiosis. In S. cerevisiae, the centromeres (CENs) are comprised of specific
DNA sequences (CDEI, CDEII, and CDEIII), though in most eukaryotes this is not the
case. While the physical position of a gene is given in kilobase pairs, with 1 bp
located at a telomere, the
genetic position of a gene is given relative to the centromere.
- Centromere DNA Element I (CDEI)
- Smallest of three adjacent centromeric domains, CDEI is an 8-11 bp consensus sequence that is bound by centromere binding factor 1 (Cbf1p).
- Centromere DNA Element II (CDEII)
- Central of three adjacent centromeric domains, CDEII is AT-rich and ~ 75-100 bp in length.
- Centromere DNA Element III (CDEIII)
- Most essential of three adjacent centromeric domains, CDEIII consists of a 25-bp consensus sequence and provides the binding site for the centromere DNA binding factor 3 (CBF3) complex.
- Cellular Component
- One of the three categories used by the Gene Ontology project, cellular
component encompasses subcellular structures, locations, and
macromolecular complexes. Examples include nucleus,
telomere, and origin recognition complex.
- Child Term
- This term is used in the context of the Gene Ontology. It refers to a
controlled vocabulary term that is more specific, or granular, aspect
of biology than its one or more parent terms. Child terms are
placed lower in the ontology than their parent terms.
For example endoplasmic
reticulum and Golgi
apparatus are child terms of the parent term cytoplasm.
- Chr_Basepair_Coord
- Chromosome basepair coordinates consist of two numbers that specify
the begining and ending location of the sequence as positioned on the
chromosomal sequence.
- Chromosome
- Chromosome refers to the structure in the cell composed of a very long molecule of DNA and associated proteins called Histones. At SGD, if a locus has been physically mapped, the chromosomal coordinates will
appear under the Sequence Coordinates category with a link to the ORF Map,
on the Locus page. The Roman numeral to the right
indicates the chromosome to which the locus maps. There are 16
chromosomes in S. cerevisiae. The Genomic
View is a graphic representation of the entire yeast genome that
allows you to display a chromosomal features map, physical map, or
combined physical and genetic map.
- Chromosome arm
- The part of a chromosome that includes the DNA sequence from one
telomere to the centromere. Usually one arm is physically longer than the other
arm. In humans the short arm is designated as 'p' (petite) and the long arm
is called 'q' (the letter following p in the Latin alphabet). Before
the S. cerevisiae genome sequence was determined, yeast chromosomal arms were
designated "left" or "right", where the left arm was the shorter one
based on genetic position and recombination frequencies of the genes it carried. Subsequent sequence information showed that a genetically
short arm may be physically longer; however, the genetic designations
are still used today in yeast gene names. For nomenclature
information, see ORF-naming
conventions. To see differences between physically and
genetically-defined chromosomes, see the Combined
Physical and Genetic Map.
- Clone
- Clone is the term used for any physical piece of DNA that has been
localized to a particular region of a chromosome. A prime clone is
any piece of DNA that is available from the ATCC; these are mostly the
Olson-Riles
set of cosmid and lambda clones, as well as many of the cosmid and
lambda clones sequenced by the systematic sequencing groups. All
clones appear on the physical map
display for the chromosome on which they reside as a green line
indicating the relative length of the clone.
- [Search Clones Help]
- ClustalW
- Clustal W is an alignment program for DNA and proteins with
improved sensitivity for the alignment of divergent protein sequences.
[Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W:
improving the sensitivity of progressive multiple sequence alignment
through sequence weighting, position specific gappenalties and weight
matrix choice. Nucleic Acids Res. 22:4673-80. ClustalW]
- Co-crystal Structure
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, an interaction is
directly demonstrated at the atomic level by X-ray crystallography.
- Coding Exon
- An exon that directs the production of a peptide sequence.
- Coding Sequence
- See CDS.
- Codon Adaptation Index (CAI)
- Codon
adaptation index is a measurement of the relative adaptiveness of the
codon usage of a gene towards the codon usage of highly expressed
genes. The relative adaptiveness (w) of each codon is the ratio of the
usage of each codon, to that of the most abundant codon for the same
amino acid. The CAI index is defined as the geometric mean of these
relative adaptiveness values. Non-synonymous codons and termination
codons (dependent on genetic code) are excluded. CAI values range
from 0 to 1, with higher values indicating a higher proportion of the
most abundant codons. [Sharp, P. M., and W. H. Li , (1987). The codon
adaptation index a measure of directional synonymous codon usage bias,
and its potential applications. Nucleic Acids Research 15:
1281-1295. Abstract,
also see: Jansen R., Bussemaker H.J., and Gerstein M. (2003) Revisiting the codon adaptation index from a whole-genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models. Nucleic Acids Res. 31(8):2242-51. Abstract
]
- Codon Bias Index (CBI)
- Codon bias index is
another measure of directional codon bias, it measures the extent to
which a gene uses a subset of optimal codons. CBI is similar to Fop, with expected usage used
as a scaling factor. In a gene with extreme codon bias, CBI will equal
1.0, in a gene with random codon usage CBI will equal 0.0. Note that
it is possible for the number of optimal codons to be less than
expected by random change. This results in a negative value for
CBI. [Bennetzen, J. L., and B. D. Hall , (1982). Codon selection in
yeast. Journal of Biological Chemistry 257: 3026-3031. Abstract]
- CodonW
- CodonW is
a software program, written by John Peden in the lab of Paul Sharp
(Dept of Genetics, University of Nottingham), that analyzes the
correspondance between amino acids and codon usage in a set of protein
sequences, based on a given genetic code, to calculate values such as
Codon Adaptation Index and Codon
Bias Index. Decisions regarding whether an amino acid is
synonymous or non-synonymous, the translation of a codon, the number
of codons in a codon family, how many synonyms a codon has, are all
determined at run time. Seven alternatives to the universal genetic
code, including S. cerevisiae chromosomal and
S. cerevisiae mitochondrial, have been in-built to the
program.
- Co-fractionation
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, an interaction is
inferred from the presence of two or more protein subunits in a
partially purified protein preparation.
- Colleagues
- Colleagues
is a searchable list of yeast researchers with their addresses (Internet
and postal) and phone numbers. Colleague information may also include
research interests, web pages, and links to other Colleague entries
for lab members, lab heads, or collaborators.
- [Search Colleagues Help]
- Co-localization
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, an interaction is
inferred from co-localization of two proteins in the cell, including
co-dependent association of proteins with promoter DNA in chromatin
immunoprecipitation experiments.
- Comparison Matrix
- Programs used to align and identify regions of sequence
similarity. The SGD Sequence
Similarity Viewer uses comparison matrices to compare all yeast
chromosomes by yeast chromosomes.
- Computational GO Annotations
- Computational GO annotations are made by a variety of
computational methods, such as sequence similarity methods, including
protein domains and motifs, and keyword mapping files. When annotations based on computational methods are NOT reviewed by a curator, they are placed in the Computational GO annotations section. Currently, all computational GO annotations for S. cerevisiae are assigned by an external source (for example, the Gene Ontology Annotation (GOA) project of the European Bioinformatics Institute (EBI). Note that the criteria for including a GO annotation in this section is whether or not it was reviewed by a curator; when annotations made by a computational method, such as sequence analysis, are reviewed by a curator, they may be found in the Manually curated section.
- Co-purification
- This term is used to identify and describe interaction data displayed
at SGD. In this type of experiment, an interaction is inferred from
the identification of two or more protein subunits in a purified
protein complex, as obtained by classical biochemical fractionation or
affinity purification and one or more additional fractionation steps.
- Contact
- If a gene name is reserved for a feature, then SGD provides the name of the researcher who has reserved it as the 'Contact' under the 'Locus History' section.
The name of the "contact" is linked to
the address information for that person under the Colleague section.
- Contained_Loci
- A list of loci that are contained
within the clone.
- Correspondence Analysis (COA)
- Correspondence analysis is an ordination technique that identifies the
major trends in the variation of the data and distributes genes along
continuous axes in accordance with these trends. Correspondence analysis
has the advantage that it does not assume that the data fall into
discrete clusters and therefore can represent continuous variation
accurately.
- Crick Strand ORF
- An open reading frame (ORF) encoded on the Crick or bottom strand
of the chromosome, which runs 5' to 3' from the right to left ends of
the chromosome.
- Curator
- A keeper of the Saccharomyces Genome Database information,
responsible for collecting and compiling data about yeast genetic loci
and DNA sequences and providing online assistance to users of the
database. The SGD
Staff page lists all current yeast curators.
- DAG
- Directed Acyclic Graph (DAG) refers to a way of arranging objects
based on their relationships and allows a child to have multiple parents.
- DB_info
- Identifies the database source of information.
- DDBJ
- DNA DataBase of Japan. DDBJ is a
repository of DNA sequences. DDBJ is produced in collaboration with
GenBank and EMBL.
- Deleted Feature
- A chromosomal feature that has been removed from the yeast genome
catalog. Typically, features are "Deleted" because they are
effectively destroyed by a sequence or annotation change
(e.g. YCL006C),
or because the original annotation was in error or inappropriate
(e.g. YCRX03C).
For record keeping, the "Deleted" feature is not removed from SGD, but
is instead given "Deleted" status as a flag. Note that "Deleted"
features are distinct from "Dubious" features in
that "Deleted" features have been demonstrated to be incorrect and have been
officially withdrawn.
- Dendrogram
- A branching tree-like diagram that illustrates the hierarchical relationships among items in a dataset; for example, the relationships among protein sequences of different organisms can be represented by a dendrogram.
- Description
- A brief description of the role that the gene plays in the cell, or a
general description of the gene product.
- Dosage Growth Defect
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, a genetic interaction
is inferred when overexpression or increased dosage of one gene causes
a growth defect in a strain that is mutated or deleted for another
gene.
- Dosage Lethality
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, overexpression or increased dosage of one gene
causes lethality in a strain that is mutated or deleted for another
gene.
- Dosage Rescue
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, a genetic interaction
is inferred when overexpression or increased dosage of one gene
rescues the lethality or growth defect of a strain that is mutated or
deleted for another gene.
- Dubious ORF
-
A Dubious open reading frame (ORF) is one that is unlikely to encode
an expressed protein. Dubious ORFs may meet some or all of the
following criteria:
1) the ORF is not
conserved in other Saccharomyces species; 2) there is no
well-controlled, small-scale, published experimental evidence that a
gene product is produced; 3) a phenotype caused by disruption of the
ORF can be ascribed to mutation of an overlapping gene; and 4) the ORF
does not contain an intron. Many ORFs classified as "Dubious" are
small and overlap a larger ORF of the class "Verified" or
"Uncharacterized";
however, overlap with another ORF does not mandate
that an ORF be classified as "Dubious."
- Epistatic Mini Array Profile (e-map)
- A method that creates and quantifies high density genetic
interaction maps. In this method, observed double mutant colony sizes
are compared to those that would be expected from a distribution of
typical double mutant colonies of each strain. Each interaction is
assigned a score which indicates the magnitude of the difference from
the expected value and the certainty of the score. A negative (or
aggravating) score < -3 would imply synthetic sick/lethal interaction
and a positive (alleviating) score > +3 would imply suppressor
interaction (Schuldiner
M, et al. (2005)).
- EC number
- The number assigned by the Enzyme Commission for a particular
enzyme activity. Currently, SGD contains EC assignments to
individual proteins, made by
UniProtKB/Swiss-Prot
curators. EC numbers assigned to individual proteins
are displayed in the "External Classifications" section of Protein
Information pages, and protein-specific links to the Enzyme nomenclature database are listed in the external links
sections of both the Locus Summary and Protein Information pages. These assignments are also included in the dbxref.tab file on
our FTP site.
- EMBL
- European Molecular Biology Labs. The EMBL
Nucleotide Sequence database is a comprehensive database of DNA and
RNA sequences. The database is produced in collaboration with GenBank and the DNA Database of Japan (DDBJ).
- Entrez
- The Entrez
Search System was developed by NCBI. Entrez allows you to
retrieve molecular biology data and bibliographic citations from
integrated nucleotide (GenBank, DDBJ, EMBL), protein (Swiss-Prot, PIR, PRF, PDB), and bibliographic (PubMed) databases. Within SGD database
pages, external links are provided to one or more of these databases.
- Epistasis
- A type of genetic interaction: the nonreciprocal interaction of
nonallelic genes in which the expression of one gene masks the
expression of another. For example, if the expression of Gene A masks
that of Gene B, Gene A is said to be epistatic to Gene B, whereas Gene
B is hypostatic to Gene A.
- Epistatic gene
- See Epistasis
- Exon
- A portion of a split gene that is included in the transcript of a gene and survives processing of the RNA to become part of the spliced messenger of a structural RNA. Exons generally occupy three distinct regions of genes that encode proteins. Exons in the first region are not translated into protein, but signal the beginning of RNA transcription and contain sequences that direct the mRNA to ribosomes for protein synthesis. Exons in the second region contain the information that is translated into the amino acid sequence of the protein, and are sometimes referred to as coding exons. Exons in the third region are transcribed into the part of the mRNA that contains the signals for the termination of translation and for the addition of a polyadenylate tail.
- Expect threshold
- The Expect threshold ("E") is a BLAST parameter that reflects the
number of matches expected to be found by chance. If the statistical
significance of a match is greater than the Expect threshold, the
match will not be reported.
Decreasing the E threshold will increase the stringency of the search:
fewer matches will be reported. On the other hand, increasing the E
threshold will decrease the stringency of the search and result in more
matches being reported.The E threshold default is set to 10
specifically for the SGD WU-BLAST tool. The E-value cut off used for
other resources and tools at SGD is documented in their respective help
pages.
- External Transcribed Spacer (ETS)
- The ETS is a region of DNA in the rDNA repeat which flanks the 18S-5.8S-25S gene cluster and is included as part of its transcription unit. The 5' ETS is immediately upstream of the 18S gene and includes the A0 processing site. The 3' ETS is immediately downstream of the 25S gene.
- Far Western
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, an interaction is
detected between a protein immobilized on a membrane and a purified
protein probe.
- FASTA
- Program used to search simultaneously both protein and DNA sequence
databases (Pearson and Lipman, 1988). FASTA uses a fast search to
initially identify sequences with a high degree of similarity to the
query sequence and then conducts a second comparison on the selected
sequences. FASTA is slower than BLAST, but is more sensitive/sometimes
yields different results.
- Filter options
- Filtering masks of portions of a query sequence that have low
compositional complexity (such as short internal repeats or poly-A
sequences) to reduce the frequency of statistically significant but
biologically uninteresting BLAST
results.
- Frequency of Optimal Codons (Fop)
- This index is the ratio of optimal codons to synonymous codons
(genetic code dependent). Fop values for the
original index are always between 0 (where no optimal codons are used)
and 1 (where only optimal codons are used). When calculating the
modified Fop index, negative values are
adjusted to zero. [Ikemura, T. (1981). Correlation between the
abundance of Escherichia coli transfer RNAs and the occurrence of the
respective codons in its protein genes: a proposal for a synonymous
codon choice that is optimal for the E. coli system. Journal of
Molecular Biology 151:389-409]
- FRET
- This term is used to identify and describe interaction data
displayed at SGD. In this type of experiment, an interaction is
inferred when close proximity of interaction partners is detected by
fluorescence resonance energy transfer between pairs of
fluorophore-labeled molecules, such as occurs between CFP (donor) and
YFP (acceptor) fusion proteins.
- Function
- See molecular function.
- GBrowse
- Developed by the Generic Model Organism Database (GMOD) project, GBrowse is an interactive genome browser that can be customized to show selected chromosomal features as well as display user provided annotations.
- GCG
- The Genetics Computer Group is a
private company involved in the development of sequence analysis
software.
- GenBank
- GenBank is
the DNA sequence database sponsored by the US National Institutes of
Health. GenBank is produced in collaboration with EMBL and DDBJ. There is also a searchable DNA sequence
database maintained by SGD (Yeast GenBank)
that contains the subset of DNA sequences submitted to GenBank that
have been derived from S. cerevisiae DNA. It includes results
of the systematic sequencing as well as results from individual
laboratories.
- Gene
- The definition of a gene changes as more properties are revealed. Two classes are generally recognized: (1) genes that are transcribed into mRNAs, which enter ribosomes and are translated into polypeptide chains, and (2) genes whose transcripts are used directly (tRNAs, rRNAs, snRNAs, etc.). Class I genes are also known as structural genes, and have been referred to as cistrons in earlier literature. There are also other shorter DNA segments that are not transcribed but instead serve as recognition sites for enzymes and other proteins that function during replication or transcription. These types of elements are generally referred to as regulatory sequences, and should not be confused with regulatory genes, which encode proteins that bind to regulatory sequences.
- Gene_Info
- The guide to the literature formerly called Gene_Info is now called the Literature Guide.
- Gene name
- With respect to S. cerevisiae
genetic nomenclature, a "gene name" refers to a name for a
specific genetic marker; S. cerevisiae gene names follow a
standardized format consisting of three letters (the gene symbol) followed by an integer
(e.g. ADE2). Dominant alleles of the gene (most often wild-type) are
denoted by all uppercase letters, while recessive alleles are denoted
by all lowercase letters. For more information
please refer to the guide to S. cerevisiae nomenclature,
published in Trends in Genetics.
- Within SGD, "gene name" is synonymous with Locus. The search option Search Gene
Names can be used to search for Gene or ORF names at SGD.
- Gene Name Registry
- See the Gene Name Registry
documentation.
- Gene Ontology (GO)
- The Gene Ontology (GO) project was established to provide a
common language to describe aspects of a gene product's biology. The
use of a consistent vocabulary allows genes from different species to
be compared based on their GO annotations. For each of three
categories of biological information--molecular function, biological
process, and cellular component--a set of terms has been selected and
organized. Each set of terms uses a controlled vocabulary, and
parent-child relationships between terms are defined. This combination
of a controlled vocabulary with defined relationships between items is
referred to as an ontology. Within an ontology, a child may be a "part
of" or an example ("instance") of its parent. There are three
independently organized controlled vocabularies, or gene ontologies,
one for molecular function,
one for biological process, and
one for cellular
component. Many-to-many parent-child relationships allowed in the
ontologies. A gene may be annotated to any level in an ontology, and
to more than one item within an ontology. The Gene Ontology project is
a collaboration between three model organism databases, FlyBase
(Drosophila), Saccharomyces Genome Database (SGD) and Mouse
Genome Informatics (MGI).
- Gene_product
- A description of the protein or RNA product (and
its function, if relevant) that is coded for by the gene.
- Gene/Sequence Resources
- This is a resource at SGD that allows one to retrieve a list of options for accessing
information available for 1) a named gene or sequence, 2) a specified
chromosomal region, or 3) a raw DNA or protein sequence. This
information includes biological information, table/map displays, and
sequence analysis and retrieval options.
- Gene Summary Paragraphs
- A Gene Summary Paragraph is a summary of published
biological information for a gene and its product which is designed to
familiarize both yeast and non-yeast researchers with the general
facts and important subtleties regarding a locus. SGD curators
compose Gene Summary Paragraphs using natural language and a
controlled vocabulary based on the Gene
Ontology (GO). Gene Summary Paragraphs contain references and
links to further information, and highlight connections between genes
from yeast and other species wherever possible.
- Gene symbol
- S. cerevisiae gene names consist of three letters (the gene
symbol) followed by an integer (e.g. ADE2). The 3-letter gene symbol
is almost always a mnemonic, standing for a description of a phenotype, gene product
or its
function. Most (but not all) gene symbols have only one
associated description, i.e., all the genes which share that 3-letter
gene symbol have a related phenotype, gene product or gene function.
- Genetic Map
- The S. cerevisiae Genetic Map was originally known as the
Mortimer
Map. The last such Genetic Map was Edition 12 released in January
1995. It is a representation of the order of and distances between
genetic markers (usually mutant alleles of genes) along each of the 16
different chromosomes. It is generated using the two-point data submitted from
laboratories world-wide. On the map, the genetic position of a gene is given
relative to the centromere, and is expressed in centiMorgans. SGD now offers a Combined
Physical and Genetic Map for each chromosome that is generated
from the most current genetic and physical map data in SGD.
- Genetic Position
- This term refers to the genetic distance between the gene and the
centromere, as derived from two-point data, and is expressed in
centiMorgans (cM).
Locations to the left of the centromere are represented as negative numbers, and
locations to the right of the centromere are represented as positive numbers. For example,
GCN4/YEL009C has a genetic position of -3 cM. This means the gene is 3
cM (also called map units) to the left of the centromere (on the left
arm of the chromosome). TRP2/YER090W has a
genetic position of 76 cM. This means it is 76 cM (map units) to the right of the
centromere (on the right arm
of the chromosome). Early yeast geneticists denoted the shorter
arm of each chromosome, in terms of genetic distance, as the left arm and the longer arm as the right
arm. However, later physical mapping efforts and sequencing of the genome showed that
for some chromosomes, the arm historically called "left" is physically longer
than the "right" arm. The Combined
Physical and Genetic Maps correlate physical distance (kilobase pairs) with genetic
distances (cM), which can vary greatly within and between chromosomes.
- genoSc
- A searchable DNA sequence
database maintained by SGD that contains the complete Saccharomyces
cerevisiae genome sequence as revealed by the international
systematic sequencing effort.
- GO
- See Gene Ontology.
- GO Annotation
- GO Annotations are statements generated from published literature
about the function(s) and biological role(s) of a gene product in the cell,
and where (location) in the cell the gene product carries out its
functions. These statements consists of 4 mandatory components: a gene
product, a term from one of the three Gene Ontology(GO) controlled
vocabularies, a reference, and an evidence code. A gene product is
typically a protein or a gene but can also be a functional RNA.
- GO Annotation Method
- Used to identify the methods used in the cited reference and the
curation method used to add make a GO
annotation, either Manually curated
or High-throughput or Computational.
- GO Annotation Source
- Refers to the Annotating/Database group that made the GO
annotation.
- GO-Slim
- A GO-Slim is a selection of high-level terms from the Biological Process, Molecular Function, and Cellular Component ontologies. These are more
general terms that represent major branches in each ontology. For
example, the GO term nucleus is a GO-Slim term from the
Cellular Component ontology. Its children (perinuclear space, nuclear
matrix, etc) are more detailed GO terms and not GO-Slim terms. All
available GO-Slims are developed by the SGD curators. The GO Term
Mapper identifies the GO-Slim terms for a list of genes based on
their annotation to detailed GO terms. The GO-Slim used with the GO
Term Mapper contains very high level GO terms. The go_slim_mapping.tab file available on the SGD ftp
site maps all gene products to a yeast-specific GO-Slim. The
yeast-specific GO-Slim contains a set of GO terms that best represent the major biological processes, functions, and cellular components that are found in S. cerevisiae.
- High score
- In the results of a BLAST search, the
scores of the highest-scoring HSP found with each database sequence is
listed in the "high score" column.
- High Scoring Segment Pairs (HSPs)
- In a BLAST search, an HSP is two
sequence fragments (one from the query sequence and the other from a
database sequence) that show a locally maximal alignment for which the
alignment exceeds a pre-defined cutoff score.
- High-throughput GO Annotations
- Refers to the GO annotation method that includes annotations made
from published experiments performed on a high-throughput or
genome-wide basis where the annotations are not reviewed by
curators. Evidence for only a subset of results from a
high-throughput or genome-wide study is reviewed by a curator, but not
each result.
- Hydropathicity of protein (GRAVY score)
- This index is the general average hydropathicity or (GRAVY) score for
the hypothetical translated gene product. It is calculated as the
arithmetic mean of the sum of the hydropathic indices of each amino
acid (Kyte and Doolittle 1982). This index has been used to quantify
the major COA trends in the amino acid usage of
E. coli genes.
- Hypostatic gene
- See Epistasis
- Identity
- An alternative comparison matrix for FASTA searches.
- Identity-weighted
- An alternative comparison matrix for
FASTA searches.
- Indel
- A hybrid term (combining the words "insertion" and "deletion")
used to describe a difference in sequence due to either an insertion
or a deletion event; especially used when the evolutionary direction
of the change is unspecified.
- Interactions Database
- See GRID.
- Internal Transcribed Spacer (ITS)
- The ITS is a region of DNA in the rDNA repeat which flanks the 5.8S gene and is included as part of the transcription unit of the 18S-5.8S-25S gene cluster. ITS1 is immediately upstream of the 5.8S gene and ITS2 is immediately downstream of the 5.8S gene.
- Intron
- A portion of a split gene that is transcribed into RNA, but subsequently removed from within the transcript prior to translation.
- Keyword
- A keyword is a word identified as particularly informative about an
object. In a sequence, a keyword often relates to the identity of a
gene or the function of the gene product. References often have a
list of keywords that are Medline MeSH terms. Keywords are good to
use in text searches.
- Kyoto
- An external link in
the Locus or Clone page to the Kyoto Encyclopedia of
Genes and Genomes. The link goes directly to the information for
that specific enzyme.
- Last_update
- "Last_update" in the GO annotations page indicates the most recent
date that information was entered into the database for a given locus.
- Literature Guide
- The Literature Guide (formerly called Gene_Info) is a guide to the literature
for a given locus and is derived from journal articles. SGD performs a
search through all PubMed literature
for all papers mentioning that locus and any
aliases. SGD curators read the abstract or full text of those papers and assign
the papers to one or more Topics that describe the kind of biological
information they contain. The Literature Guide is thus
designed to help the user easily find the papers relevant to a given
locus. Please note, however, that since for some papers only abstracts are read, the
Literature Guide may not be a complete description of the information
contained in the papers.
- [Literature Guide Help]
- Literature Guide Annotation
- At SGD, Literature guide annotations are topics that are
associated with papers in order to categorize them, to facilitate
searching by users for specific types of information. These
annotations may be linked to genes or not, depending on the
information in the paper. A complete list of literature guide topics
is available in the Literature Guide Help
document.
- Locus
- A "locus" most often is a gene, characterized by a mutant
phenotype or by a DNA sequence, which has been either genetically
mapped or otherwise localized (e.g. by DNA sequence comparison or
hybridization) to a particular spot in the yeast genome. A locus may
also be a DNA sequence
feature such as a centromere. A very small number of "loci" which
are contained in the database have not been genetically mapped or
otherwise localized, but instead have only been shown to be a mutant
phenotype that segregates as a single gene. Therefore these are not
"loci" in the strict sense of the word, but they are included in the
database because the names and information about these putative "loci"
have been published; see the SGD Gene Naming
Guidelines and the subsection therein on resolving gene
name problems for a description of how we deal with these putative
"loci".
- [Locus Help] [Search Gene Names]
- Locus history
- Locus history records any comments of interest associated with
the gene, such as mapping information, other names that the gene has
been called (especially in the case where the other name is used in
the database for yet a different locus), etc., and can be viewed by
clicking the Locus History link from the bottom of each locus page.
It includes update information from the Locus_notes category as well as
notes added since the conversion to Oracle. For reserved gene names,
the Locus history includes the reservation date and expiration date.
- Locus_notes
- Locus_notes section of a locus history page is used to document
any comments of interest associated with the gene, such as mapping
information, other names that the gene has been called (especially in
the case where the other name is used in the database for yet a
different locus), etc. The number that precedes the comment refers to
the edition of Mortimer et
al. (i.e., the yeast genetic and physical map publication) in
which the comment first appears.
- Long Terminal Repeat (LTR)
- Identical sequences, typically several hundred nucleotides in
length, that are located both at the ends of intact Ty
retrotransposons and as solo elements present in multiple copies
throughout the genome. There are several types of LTR elements in
yeast: delta, tau, sigma and omega.
- Manually curated GO Annotations
- Refers to the GO Annotation Method that includes annotations made by curators reading the literature for each gene and making annotations from published papers when available. When published literature is available, such annotations may include those based on experiments, sequence similarity, or other computational analyses described in the paper, or on statements made by the authors. When no published literature is available for a gene, annotations may be made on the basis of curatorial judgements.
- Map
- If a locus has been genetically mapped, the "ORF Map" and "Genetic position" under the Sequence Coordinates section of the locus page will
display details of the locus/feature. The Roman numeral to the
right of "Map" indicates the chromosome to which the locus maps. The
number to the right of "Genetic Position" indicates the map position of the
locus (in centimorgans) from the centromere, where negative numbers
indicate distances to the left of the centromere (the left arm) and
positive numbers correspond to right arm distances.
- Mapping_data
- This displays links to all of the 2-point cross
tetrad data where the locus was used as one of the markers.
- Medline
- Medline is the National Library of
Medicine's database of biomedical papers; it contains all citation
information for each paper, as well as abstracts for most of the
papers.
- Medline UID
- The "Medline" tag that appears within the listed information for a
paper contains the Medline unique identifying number (UID) for the
paper; the first 2 numbers usually (but not always) indicate the year
of publication.
- Merged Feature
- A chromosomal feature that was once annotated as a distinct entity, but that has now been subsumed by another feature. Typically, features become "Merged" because of a change in chromosomal sequence or annotation (e.g. YAR004W). For record keeping, the "Merged" feature is not removed from SGD, but is instead given the "Merged" status as a flag.
- Minimal Tiling Path
- A map or table showing placement and order of a set of clones that
completely, contiguously cover some segment of DNA in which you are
interested.
- MIPS
- The initials stand for Munich Information Center for Protein Sequences. MIPS is the coordinator
of the European Commission Genome Projects.
- Molecular Function
- One of the three categories used by the Gene Ontology project,
molecular function describes the tasks performed by individual
gene products; examples are transcription factor and DNA
binding.
-