Poland D (2004) The phylogeny of persistence in DNA. Biophys Chem 112(2-3):233-44
Abstract: We continue our study, Poland [Biophysical Chemistry 110 (2004) 59-2], of the distribution of C or G (C-G for short) in the DNA of select organisms, in particular, the tendency for C-G to cluster on all scales with respect to the number of bases considered. We previously found that if we counted the number of C-G bases in consecutive, nonoverlapping boxes containing a total of m bases, then the width of the distribution function describing how many C-G bases are in a box increases with respect to m dramatically relative to the width expected for a random distribution. The relative width of the C-G composition distribution function was found to vary accurately as a power law with respect to m, the size of the box, over a very wide range of m values. We express the power law in terms of a characteristic exponent gamma, that is, the relative widths of the distributions vary as m(gamma). The enhanced relative width of the distribution functions is a direct consequence of the tendency for boxes of similar composition to follow one another. This tendency represents persistence in composition from box to box and hence we refer to gamma as the persistence exponent. The occurrence of a power law means that the tendency for C-G to cluster is present on all scales of sequence length (box size) up to the total length of the chromosome which for bacteria is the entire genome. The persistence exponent gamma that characterizes the power law is thus an important parameter describing the distribution of C-G on all scales from individual base pairs up to the total length of the DNA sample considered. In the present paper, we determine the characteristic exponent gamma and the associated fractal dimension of DNA samples for a selection of species representing all of the major types of organism, that is, we explore the phylogeny of the exponent gamma. Here we treat six prokaryotes and six eukaryotes which, together with the species we have previously treated, brings the total number of species we have examined to 15. We find the power law form for the C-G distribution for all of the species treated and hence this behavior seems to be ubiquitous. The values of the characteristic exponent gamma that we find tend to cluster around the value gamma=0.20 with no obvious pattern with respect to phylogeny. The extreme values that we obtain are gamma=0.057 (yeast) and gamma=0.386 (human). We conclude by showing that the persistence of C-G clustering on the scale of the length of a chromosome is dramatically illustrated by interpreting the C-G distribution as a random walk.
|Status: Published||Type: Journal Article||PubMed ID: 15572254|
Topics addressed in this paper
- To go to the Locus page for a gene, click on the gene name.
|Topics||Topics not linked to Genes|
|Other genomic analysis|