Small is beautiful and meaningful: Identification and
characterization of Non-Annotated ORFs (NORFs) in S. cerevisiae.
Munira Basrai, James Kastenmayer, Carole Carter
Genetics, National Cancer Institute, 8901 Wissonsin Ave, Bethesda, MD
20889, USA
Annotation of the S. cerevisiae genome revealed
6275 ORFs which includes genes that were previously characterized and
those that encode for proteins of at least 100 amino acids.
Computational identification of small ORFs (smORFs) (<100 amino acids)
based on sequence analysis alone is severely limited by high false
positive rates. Also, smORFs have been missed in traditional genetic
screens due to their small target size. Hence, it is a challenge to
identify small genes which encode for biologically important class of
molecules and are 'buried' in an enormous pile of meaningless short
ORFs. During the past year evidence for the presence of hundreds of
smORFs has been accumulating. These data are derived primarily from two
approaches: (a) RNA and protein based expression analysis and (b)
comparative genomics. In the first of such studies, Serial Analysis of
Gene Expression (SAGE) revealed over 300 small NORFs that were expressed
at more than one copy per cell. Microarrays and global analysis of
proteins confirmed and extended the SAGE study to validate the presence
of NORFs. In a second study NORFs were identified based on expression of
a gene containing a transposon bearing a lacZ reporter. In a second
approach, comparitive genomics identified new NORFs and showed that a
subset of NORFs identified through expression analyses are
evolutionarily conserved between different fungal species including
Ashbyii gossypii, six different Saccharomyces species and
other eukaryotic systems including humans. These analysis have also
identified small non-coding RNAs (ncRNAs). Compilation of the NORF data
derived from independent approaches suggests the presence of at least
300 NORFs. We are undertaking a collaborative project to generate
knockouts of these NORFs. These knockout strains will be a valuable
resource for in vivo experimental data and further our
understanding of fundamental biological problems.