Comparative Genomics tells us which ORFs are genes?
J. Michael Cherry, Saccharomyces Genome Database
Genetics, Stanford University, 300 Pasteur Drive, Stanford, CA 94305-5120, USA
Which regions of the genome that hypothetically encode a protein, aka
open reading frames (ORFs), are actually transcribed and translated by
the yeast cell? There are over 24,000 hypothetical ORFs contained within
the systematic S. cerevisiae S288C genome sequence that can
encode a peptide of at least 50 amino acids. The genome sequencers
devised rules that limit the number of these regions that were given
standard ORF names, creating an initial set of hypothetical ORFs. The
genomic sequence of many yeast species close phylogenetically to S.
cerevisiae have been partially determined. These genomic sequences
allow an evolutionary analysis to be conducted to determine which of
these hypothetical ORFs are biologically relevant. These data indicate
that at least 5,100 ORFs are conserved between S. cerevisiae and
at least one other Saccharomyces species. These represent regions
of the genome that have been selected by nature within the
Saccharomyces genus. The comparative analysis also suggests many
pairs of S. cerevisiae ORFs that are likely to be merged into one
longer reading frame. The Saccharomyces Genome Database (SGD) has
begun a collaboration to verify the S288C sequence for these putative
frame-shift regions. The comparative results in connection with positive
experimental results allow the number of yeast genes to be refined,
allowing us to distinguish between genes, and hypothetical or
questionable ORFs.
Return to YGM 2002 Home at SGD