Loading Events

« All Events

  • This event has passed.

PhD Proposal – Timothy Becker

November 1, 2017 @ 11:30 am - 12:30 pm UTC-5

Title: Machine Learning methods for Complex Structural Variation analysis

Student: Timothy Becker

Major Advisor: Dr. Dong-Guk Shin

Associate Advisors: Dr. Yufeng Wu, Dr. Ion Mandoiu

Date/Time: Wednesday, November 1st, 2017 at 11:30am in Babbidge 1947 meeting room


Detecting variations larger than 50 nucleotide bases called Structural Variants (SV) with current DNA sequencing technology remains challenging in normal tissues, but becomes more problematic with the increased heterogeneity and allele complexity found in tumor tissues.  We focus on three areas:


(1) Multi-input SV arbitration method

The first part of the dissertation describes a multi-input SV fusion method (FusorSV) that uses features and prior knowledge to produce a comprehensive and arbitrated call set.  We show that this approach works well on deletion, duplication and inversion call types in germline data by constructing a fully automated SV calling engine (SVE) that runs eight popular calling algorithms and utilizes the freely available 1000 Genomes Phase 3 high coverage data set.  By focusing on the SV type and length as features, FusorSV outperformed existing algorithms based on 1000 rounds of permutation testing and had a concordantly high in vitro validation rate in excess of 85% for novel SV events.


(2) Somatic genome generation method

The second area details a genome generator (soMaCX) that models somatic evolution from sub clonal to cancer stem cell instances under continuous control.  Joint SV distributions are constructed from SV type, size, complexity and region controls.  Gain and loss of function are modeled by considering the SV type and its positive or negative effect after transcription, therein providing the needed mechanism to simulate selective pressure to ONCO genes and replication regions like the NHEJ pathway.  To provide user control of sample purity, reads are simulated for both normal and somatic tissues and resulting data is randomly sampled.


(3) Sequence feature extraction method and application

The final part of this dissertation comprises a sequence feature extraction framework (SAFE) and its application to somatic SV analysis.  We propose a genomic signal processor framework that abstracts and transforms sequences and alignment entries into feature vectors such as read depth, split read depth, clipped read depth, supplemental read depth, strand bias, k-mer frequency and nucleic acid proportion.  Integral to this framework will be an out-of-core data structure that will offer efficient random access, normalization and indexing on large data sets.  We will then prove effectiveness by application to SV allele complexity and heterogeneity using machine learning methods in conjunction with SAFE.


November 1, 2017
11:30 am - 12:30 pm UTC-5
Event Category:


HBL Class of 1947 Conference Room
UConn Library, 369 Fairfield Way, Unit 1005
Storrs, CT 06269 United States
+ Google Map
(860) 486-2518

Connect With Us