Tags:Ancient DNA, Bioinformatics, Machine learning and Modern DNA
Abstract:
DNA, or deoxyribonucleic acid, carries the entirety of genetic information of any living organism. The study of the bacterial DNA extracted from human bones excavated from archaeological and anthropological sites aims to analyse the evolution of microorganisms inhabiting the human body and to contribute to new insight related to the health, diet and even migration of our ancestors. This paper aims to offer a solution for the discrimination between ancient and modern bacterial DNA in dental calculus. We propose three internal representations for the considered DNA sequences in order to analyse which captures the most information and is more informative for classification models. Two of these are text based, while the third one takes advantage of several physical and chemical properties of nucleotides in the DNA. We use a data set containing both ancient and modern dental calculus bacterial DNA and apply two supervised models, namely artificial neural networks and support vector machines to distinguish between the two types of sequences. The two main conclusions indicated by the obtained results are: the representation based on physical and chemical properties seems to best capture relevant information for the task at hand; for the considered data set and DNA encoding proposals, support vector machines outperform artificial neural networks, although results obtained by both models are promising.
Machine Learning Based Models for Examining Differences Between Modern and Ancient DNA