DNA GeometryAlex KasmanDepartment of Mathematics
|
|
The genetic code which converts a triple of DNA bases (or "codon") into an amino acid is well-known. Less well known is that the DNA sequence also influences the geometry of the DNA molecule itself. (See Olson et al 1998 and their supplemental materials.) For example, the DNA sequences
GGCAAAAACGGGCAAAAACGGGCAAAAACGGGC
GGAAAAAACGGCCAAAAACGTGCAAAGACCGGC
both encode the same protein. However, the expected geometries of the DNA molecules with those sequences are very different from each other. The first one bends sharply while the second is nearly straight. (See figure.) This observation inspired me to mathematically investigate the relationship between the genetic code and the "geometric code".
Downloads:
Abstract: It is well known that sequences of bases in DNA are translated into sequences of amino acids in cells via the genetic code. More recently it has been discovered that the sequence of DNA bases also influences the geome- try and deformability of the DNA. These two correspondences represent a naturally arising example of duplexed codes, providing two different ways of interpreting the same DNA sequence. This paper will set up the notation and basic results necessary to mathematically investigate the relationship between these two natural DNA codes. It then undertakes two very dif- ferent such investigations: one graphical approach based only on expected values and another analytic approach incorporating the deformability of the DNA molecule and approximating the mutual information of the two codes. Special emphasis is paid to whether there is evidence that pres- sure to maximize the duplexing efficiency influenced the evolution of the genetic code. Disappointingly, the results fail to support the hypothesis that the genetic code was influenced in this way. In fact, applying both methods to samples of realistic alternative genetic codes shows that the duplexing of the genetic code found in nature is just slightly less efficient than average. The implications of this negative result are considered in the final section of the paper. |
What I am posting here for download is the Mathematica notebook that I used to generate the figures (like the one above) and also to compute the measures of duplexing efficiency in the paper. This could be of use to anyone who wants to verify my code and conclusions, but also to people doing other investigations involving the sequence dependent geometry of DNA. If you do download this file, I request only that you cite it if you use it and that you do not redistribute it yourself. In addition, I am posting a zip file that contains the values of the function "numberinboxgivencodon" that was used in computing the mutual information. If you download the notebook you can compute these values on your own, but since it would take a very long time you might prefer to just load them directly.