- This event has passed.
Ph.D. Defense: Abhijit Mondal
November 29 @ 11:00 am - 12:30 pm EST
Doctoral Proposal Oral Defense
Title: Algorithms for Understanding and Dating Microbial Evolution Through Horizontal Gene Transfer
Ph.D. Candidate: Abhijit Mondal
Major Advisor: Dr. Mukul S. Bansal
Associate Advisors: Dr. Ion Mandoiu, Dr. Derek Aguiar
Date/Time: Monday, November 29th, 2021, 11:00 AM.
Meeting link : https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m6787e62e084b25e2f290fb678d41b525
Password: c923YKk9BDW (29239559 from phones)
Event number: 2620 481 9431
Join by phone: +1-415-655-0002
Access code: 2620 481 9431
Horizontal gene transfer (HGT) is one of the most important drivers of microbial evolution. Recent computational advances in the study of microbial gene families have now made it possible to infer HGTs efficiently and with relatively high accuracy. Despite the resulting ability to efficiently infer HGTs, several fundamental aspects of HGTs remain poorly understood. Furthermore, there are several important problems in the study of microbial evolution that could benefit from using the evolutionary information present in inferred HGTs. In this dissertation proposal, we focus on the development of new computational methods that can leverage these recent computational advances for HGT inference to (i) distinguish between the two types of HGTs, additive and replacing, (ii) better understand the scale of HGT events by systematically detecting horizontal transfer of protein domains, and (iii) use inferred HGTs to improve microbial phylogenetic dating.
Our first contribution is the development of a supervised machine learning approach, called ARTra, for distinguishing between additive and replacing HGTs. An additive HGT occurs when the transferred gene adds itself as a new gene to the recipient genome, and a replacing HGT occurs when the transferred gene replaces an existing homologous copy of that gene. The complexity of microbial evolution makes it difficult to computationally distinguish between these two types of HGT. ARTra uses as features the classifications provided by several simple classification rules, along with phylogenetic information, and ensembles them to produce a more accurate classification. Rigorous experimental analysis using simulated and real data shows that ARTra is effective at distinguishing between additive and replacing HGTs.
Our second contribution is the development of a new, constrained optimization based computational method, called DaTeR, to improve the dating of microbial phylogenetic trees by using HGTs to impose relative constraints on dates assigned to different regions of the phylogenetic tree. The presence of an HGT event is informative for dating since it implies that the donor organism could not have lived any later (more recently) than the recipient of that HGT. However, traditional phylogenetic dating approaches do not make use of such HGT constraints, and more recent methods that do are limited in their ability to use them. Our method starts with a conventionally dated phylogenetic tree and then minimally modifies it to satisfy all available HGT constraints. We demonstrate the effectiveness and utility of DaTeR by applying it to real biological data.
Finally, as our third contribution, we propose to extend a recently developed computational framework for studying the evolution of protein domains within and across gene families. Specifically, we propose to extend an existing computational framework, called the Domain-Gene-Species reconciliation framework, that allows for the co-inference of gene-level and domain-level evolutionary events in multi-cellular eukaryotes (which generally have negligible horizontal gene transfer), by allowing for the transfer of genes and protein domains across species boundaries through horizontal transfer. Our preliminary work on this problem has shown that the underlying computational problem is NP-hard but approximable to within a constant factor, where the specific approximation ratio depends on the “event costs” used for the reconciliation. Our ongoing work is focused on developing effective heuristic solutions for the problem and on applying our new algorithms to both simulated and real biological data.