NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Statistical properties of DNA sequencesWe review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Document ID
20040089405
Acquisition Source
Legacy CDMS
Document Type
Reprint (Version printed in journal)
Authors
Peng, C. K.
(Harvard Medical School Boston, MA 02215, United States)
Buldyrev, S. V.
Goldberger, A. L.
Havlin, S.
Mantegna, R. N.
Simons, M.
Stanley, H. E.
Date Acquired
August 21, 2013
Publication Date
January 1, 1995
Publication Information
Publication: Physica A
Volume: 221
ISSN: 0378-4371
Subject Category
Life Sciences (General)
Distribution Limits
Public
Copyright
Other
Keywords
Non-NASA Center
NASA Discipline Cardiopulmonary

Available Downloads

There are no available downloads for this record.
No Preview Available