NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Statistical and linguistic features of DNA sequencesWe present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
Document ID
20040089762
Acquisition Source
Legacy CDMS
Document Type
Reprint (Version printed in journal)
Authors
Havlin, S.
(Boston University MA 02215, United States)
Buldyrev, S. V.
Goldberger, A. L.
Mantegna, R. N.
Peng, C. K.
Simons, M.
Stanley, H. E.
Date Acquired
August 21, 2013
Publication Date
June 1, 1995
Publication Information
Publication: Fractals
Volume: 3
Issue: 2
ISSN: 0218-348X
Subject Category
Life Sciences (General)
Distribution Limits
Public
Copyright
Other
Keywords
Non-NASA Center
NASA Discipline Cardiopulmonary
NASA Discipline Number 14-10
NASA Program Space Physiology and Countermeasures

Available Downloads

There are no available downloads for this record.
No Preview Available