Loading Events

« All Events

  • This event has passed.

Ph.D. Defense: Timothy Becker

July 19 @ 12:00 pm - 1:30 pm EDT

Title: Machine Learning Methods for Complex Structural Variation Analysis

Ph.D. Candidate: Timothy Becker

Major Advisor: Dong-Guk Shin

Associate Advisors: Ion Mandoiu, Yufeng Wu

Additional Readers: Sheida Nabavi, Derek Aguiar

Date/Time: Monday July 19th, 12:00pm

Meeting Link: https://uconn-cmr.webex.com/uconn-cmr/onstage/g.php?MTID=e45b8f418764deec9c4e02f87f02241a0

Meeting Number: 120 709 1994

Password: xAaJipEC332

Abstract: Detecting germline variations larger than 50 nucleotide bases called Structural Variants (SV) in normal tissue with high accuracy remains challenging with high throughput DNA sequencing. It is even more difficult with the heterogeneity and allele complexity associated with tumor tissues in somatic SV calling. We first show that existing germline SV calling accuracy can benefit from a supervised ensemble training method called FusorSV. This method however is not immediately applicable to somatic SV calling since there are so few somatic SV callers compared to the germline counterparts and no open-source datasets to learn from. Thus, we developed three complementary methods that are used in conjunction to formulate a somatic SV calling framework. We first describe a feature moment extraction system called HFM that extracts important SV signatures like read-depth, insert-size, mapping quality and orientation information from the reads. Next, we detail a supervised variable topology neural network called TensorSV that is designed to work with HFM features and variable heterogeneity VCF calls. To overcome the lack of gold-standard somatic SV datasets, we designed a complex somatic genome generation framework called somaCX that generates a germline genome and then simulates the somatic evolutionary process over that basis using non-uniform distributions that account for gene function. We then build a somatic training set comprising high SV rates and oncogene enrichment to build specialized TensorSV models for deletion (DEL), duplication (DUP), and inversion types (INV). We strike a balance in somatic SV calling accuracy by using four-class models: one for no event, two for the standard germline frequencies and one specifically designed for low-frequency events that are present only in mixed-purity tissues. We demonstrate that these models are effective by applying them to open-source TCRB samples and find that our SV calls in the oncogene regions have higher enrichment than healthy background samples in addition to having high correlated differential oncogene expression patterns.


July 19
12:00 pm - 1:30 pm EDT

Connect With Us