Fields Notes 17:3 | Page 18

A

da Lovelace Day

The Mathematics of Genomes

Your DNA is extremely organized. Each of the 3-billion base pairs that make up your genome is placed in exactly the right position so that everything works properly, and so that you are, well, you. But if you were to look at your genomic sequence – all 3 billion bases laid out in front of you – nothing about it would seem organized. To the untrained eye, it’ s a jumble of letters with no discernible order or structure.

Perhaps it’ s fitting then, that Lila Kari, Professor in the Department of Computer Science at the University of Waterloo, used chaos game theory to represent genomic sequences.
The premise is simple: begin with a single point at the centre of a square where each corner represents one of the DNA bases, A, C, T, or G. The next point is then determined by the midpoint of the line connecting the first point and the corner matching the next letter in the sequence. The third point is the midpoint of the line from that point to the corner matching the next letter, and so on.
species of fish with primitive lungs that slid into the amphibian cluster.
“ This method captured the fact that they have common characteristics and you are really unable to untangle them,” says Kari. It also determined that the modern human is most closely related to the chimp and furthest from the cucumber.
The beauty of this approach is that it doesn’ t rely on direct comparisons of specific genes that may or may not exist in all organisms. The image distance employed, the Structural Dissimilarity Index( DSSIM), implicitly compares the occurrences of oligomers of length nine in DNA sequences, without reference to what those sequences are or which organism genome they were taken from.
“ It can be a computer-generated sequence that makes no sense whatsoever or alien DNA from outer space. I don’ t care, bring it on,” laughs Kari.
Suddenly, when faced with these visual representations of the genome, the underlying structure and organization becomes clear. Patterns emerge that are very different from species to species. With a simple mathematical manipulation, the genome can be transformed into something quantifiable.
By computing the“ image distance” between each graphical genome representation and then employing multidimensional scaling, Kari is transforming the tree of life into the map of life, with each DNA sequence represented as a single point in 3D space and the spatial proximity between any two points reflecting their degree of similarity.
Chaos game theory representation of the miochondrial genome of the human( top) and red algae( bottom).
When Kari’ s group performed this comparison for more than 3,100 complete mitochondrial genomes, known phyla and subphyla( mammals, amphibians, reptiles, etc.) clustered together in non-overlapping subsets with very few exceptions, and agreed remarkably well with classical phylogenetic trees. What gave Kari goosebumps is that species that crept over the boundaries often still appeared logically placed, like
18