The Z curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed from the other. Based on the Z curve, a new protein coding gene-finding algorithm specific for the yeast genome at better than 95% accuracy has been proposed. Six cross-validation tests were performed to confirm the above accuracy. Using the new algorithm, the number of protein coding genes in the yeast genome is re-estimated. The estimate is based on the assumption that the unknown genes have similar statistical properties to the known genes. It is found that the number of protein coding genes in the 16 yeast chromosomes is </=5645, significantly smaller than the 5800-6000 which is widely accepted, and much larger than the 4800 estimated by another group recently. The mitochondrial genes were not included into the above estimate. A codingness index called the YZ score (YZ OE [0,1]) is proposed to recognize protein coding genes in the yeast genome. Among the ORFs annotated in the MIPS (Munich Information Centre for Protein Sequences) database, those recognized as non-coding by the present algorithm are listed in this paper in detail. The criterion for a coding or non-coding ORF is simply decided by YZ > 0.5 or YZ < 0.5, respectively. The YZ scores for all the ORFs annotated in the MIPS database have been calculated and are available on request by sending e-mail to the corresponding author.
|Evidence ID||Analyze ID||Interactor||Interactor Systematic Name||Interactor||Interactor Systematic Name||Type||Assay||Annotation||Action||Modification||Phenotype||Source||Reference||Note|
|Evidence ID||Analyze ID||Gene||Gene Systematic Name||Gene Ontology Term||Gene Ontology Term ID||Qualifier||Aspect||Method||Evidence||Source||Assigned On||Reference||Annotation Extension|
|Evidence ID||Analyze ID||Gene||Gene Systematic Name||Phenotype||Experiment Type||Experiment Type Category||Mutant Information||Strain Background||Chemical||Details||Reference|
|Evidence ID||Analyze ID||Regulator||Regulator Systematic Name||Target||Target Systematic Name||Experiment||Conditions||Strain||Source||Reference|