Coding sequences have the ability to generate long Open Reading Frames
in their antisense strand. Later such overlapping ORFs may be
duplicated and translocated to other regions of the genome. Such
noncoding, antisense pseudogenes are not subject to selection. They
may accumulate mutations and diverge very quickly. If they are not
drastically shortened, they may be mistaken for coding sequences,
because the probability of spontaneous generation of such long ORFs is
very low. By measuring coding/noncoding DNA strand asymmetry, we have
found that there are about 2500 antisense ORFs longer than 100 codons
in the Saccharomyces cerevisiae genome. They have the
properties of the antisense of known protein coding genes and most
probably do not code for proteins. About 1000 are still present in
data bases. It means that the total number of protein coding ORFs
longer than 100 codons does not exceed 4800. This number is very well
correlated with the results of experimental research of the yeast
genome, e.g. the total number of transcripts assigned to ORFs.
Return to YGM 2000 Abstract Index