NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysisAn open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Document ID
20040142070
Acquisition Source
Legacy CDMS
Document Type
Reprint (Version printed in journal)
Authors
Buldyrev, S. V.
(Boston University Massachusetts 02215, United States)
Goldberger, A. L.
Havlin, S.
Mantegna, R. N.
Matsa, M. E.
Peng, C. K.
Simons, M.
Stanley, H. E.
Date Acquired
August 22, 2013
Publication Date
May 1, 1995
Publication Information
Publication: Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics
Volume: 51
Issue: 5
ISSN: 1063-651X
Subject Category
Life Sciences (General)
Distribution Limits
Public
Copyright
Other
Keywords
Non-NASA Center
NASA Discipline Cardiopulmonary
NASA Discipline Number 14-10

Available Downloads

There are no available downloads for this record.
No Preview Available