Fields Notes 17:3 | Page 18

A

da Lovelace Day

The Mathematics of Genomes

Your DNA is extremely organized . Each of the 3-billion base pairs that make up your genome is placed in exactly the right position so that everything works properly , and so that you are , well , you . But if you were to look at your genomic sequence – all 3 billion bases laid out in front of you – nothing about it would seem organized . To the untrained eye , it ’ s a jumble of letters with no discernible order or structure .

Perhaps it ’ s fitting then , that Lila Kari , Professor in the Department of Computer Science at the University of Waterloo , used chaos game theory to represent genomic sequences .
The premise is simple : begin with a single point at the centre of a square where each corner represents one of the DNA bases , A , C , T , or G . The next point is then determined by the midpoint of the line connecting the first point and the corner matching the next letter in the sequence . The third point is the midpoint of the line from that point to the corner matching the next letter , and so on .
species of fish with primitive lungs that slid into the amphibian cluster .
“ This method captured the fact that they have common characteristics and you are really unable to untangle them ,” says Kari . It also determined that the modern human is most closely related to the chimp and furthest from the cucumber .
The beauty of this approach is that it doesn ’ t rely on direct comparisons of specific genes that may or may not exist in all organisms . The image distance employed , the Structural Dissimilarity Index ( DSSIM ), implicitly compares the occurrences of oligomers of length nine in DNA sequences , without reference to what those sequences are or which organism genome they were taken from .
“ It can be a computer-generated sequence that makes no sense whatsoever or alien DNA from outer space . I don ’ t care , bring it on ,” laughs Kari .
Suddenly , when faced with these visual representations of the genome , the underlying structure and organization becomes clear . Patterns emerge that are very different from species to species . With a simple mathematical manipulation , the genome can be transformed into something quantifiable .
By computing the “ image distance ” between each graphical genome representation and then employing multidimensional scaling , Kari is transforming the tree of life into the map of life , with each DNA sequence represented as a single point in 3D space and the spatial proximity between any two points reflecting their degree of similarity .
Chaos game theory representation of the miochondrial genome of the human ( top ) and red algae ( bottom ).
When Kari ’ s group performed this comparison for more than 3,100 complete mitochondrial genomes , known phyla and subphyla ( mammals , amphibians , reptiles , etc .) clustered together in non-overlapping subsets with very few exceptions , and agreed remarkably well with classical phylogenetic trees . What gave Kari goosebumps is that species that crept over the boundaries often still appeared logically placed , like
18