NTU Undergraduates' research April 2014 - Biosciences | Page 16

Examining the distribution of overlapping reading frames within viruses Sukhdeep Kaur 15/04/2014 Biomedical Sciences Nottingham Trent University, Clifton Lane, Nottingham, NG11 8NS Abstract: The study was to look at the distribution of overlapping genes within viruses and how the distribution of an overlap may differ from one family of viruses to another. The study also looked into detail of the orientation of the overlap identifying whether double stranded DNA viruses had more convergent, divergent, parallel or nested overlaps. Viruses with completed genomes within the NCBI http server database were downloaded as GenBank files and organised into 4 categories whether they were double or single stranded or DNA or RNA molecules. Using Phython version 2.7.6, a script was ran in order to extract the relevant information needed of the genes that were overlapping. Random selection of one viral strain per family of viruses were selected by random and the data was inputted into R. dsRNA was not investigated further due to the lack of data. The orientation of overlaps within each family was investigated where frequencies of overlaps within dsDNA; convergent 0.17, divergent 0.1, parallel 0.363, nested 0.367. Frequencies of overlaps within ssDNA; convergent 0.06, divergent 0.02, parallel 0.53, nested 0.38. Frequencies of overlaps within ssRNA; convergent and divergent 0, parallel and nested 0.5. The chi squared test revealed that the null hypothesis would be rejected as differences are 99% due to other factors and not due to chance. The R programme allowed us to produce histograms and collate data together so that we could analyse the overlaps and nucleotide base pairs where the overlap was occurring within each family. The dsDNA family in the first section contained a lot of data and therefore was examined further. Investigations into whether certain orientations of overlap contributed more to the total number of overlaps found within dsDNA viruses. Results showed that a lot of the smaller base pair overlaps were occurring in the form of convergent, divergent and parallel overlaps and the larger overlaps were occurring in the form of nested overlaps. Due to the degeneracy of the genetic code and the limitations of the codons, expectations of what the coding sequences is was made. This investigation can be further delved into by looking at the actual coding sequences within the overlapping genes that we randomly selected to see whether the expectations we predict are correct and accurate. Certain codons may be more abundant than others within viruses due to mutational biases and selective forces within the environment; this may be because some codons translate faster than others or the fact that some codons are less prone to mistranslation.