July 31, 2018 –
Title: An Improved Probabilistic Simulation Framework for Gene Family Evolution
Master's Degree Candidate: Soumya Kundu
Major Advisor: Dr. Mukul Bansal
Associate Advisors: Dr. Ion Mandoiu, Dr. Yufeng Wu
Date/Time: Tuesday, July 31, 2018 at 10:00 AM
Location: ITEB 336
Phylogenetic trees are frequently used to represent the evolution of species, genes, and protein domains. Gene family evolution is usually represented in a framework where gene trees evolve inside a species tree. The recent Domain-Gene-Species model of evolution presents a framework where protein domains also evolve inside one or more gene trees, each of which evolves inside a species tree. The Duplication-Transfer-Loss (DTL) reconciliation and the Domain-Gene-Species (DGS) reconciliation models allow for a parsimony-based approach to inferring the evolutionary histories of a given set of species, genes, and, in the case of DGS reconciliation, protein domains.
However, in the absence of biological data regarding the true evolutionary histories of these species, genes, and domains, we must rely on simulated data to validate the accuracy of such phylogenetic reconciliation methods. Although numerous probabilistic simulation frameworks exist for gene family evolution, such as PrIME-GenPhyloData and SimPhy, none of these existing frameworks account for certain aspects of gene family evolution, such as the presence of both additive and replacing horizontal gene transfers and the possibility that the gene family might not be present at the root of the species tree. Furthermore, there are currently no existing simulation frameworks that simulate sub-gene level events such as partial gene transfers and the evolution of domain families.
In this work, we modify the PrIME-GenPhyloData simulation framework to simulate both replacing and additive horizontal gene transfers, account for phylogenetic distance bias in choosing transfer recipients, and randomly select the location of gene birth in the species tree. In addition, we introduce the ability to simulate sub-gene level events such as partial gene transfers through the simulated evolution of protein domains within gene families, creating the first probabilistic simulation framework of its kind.
To demonstrate the utility of our new simulation framework, we systematically evaluate the accuracy of the DTL reconciliation algorithm on simulated datasets that contain both additive and replacing transfers. Our results from this simulation study indicate that DTL reconciliation, which assumes that all transfers are additive, is surprisingly robust to the presence of replacing transfers, and suggest that it should be possible to design effective heuristics for the DTL reconciliation problem with replacing transfers based on just standard DTL reconciliation.