- This event has passed.
Ph.D. Defense: Fatima Zare
September 16, 2020 @ 1:00 pm - 2:00 pm EDT
Title: Developing Novel Copy Number Variation Detection Method using Emerging Sequencing Data
Student: Fatima Zare
Major Advisor: Prof. Sheida Nabavi
Associate Advisors: Prof. Ion Mandoiu and Prof. Jeffrey Chuang
Date/Time: Wednesday, Sep 16, 2020, 1:00-2:00 P.M.
Doctoral Dissertation Oral Defense
Hosted by Fatemeh Zare
Wednesday, Sep 16, 2020 1:00 pm | 2 hours | (UTC-04:00) Eastern Time (US & Canada)
Meeting number: 120 659 2420
Join by phone
+1-415-655-0002 US Toll
Access code: 120 659 2420
Copy number variation (CNV) has gained considerable interest as a type of genomic/genetic variation that plays an important role in disease susceptibility. Advances in sequencing technology have created an opportunity for detecting CNVs more accurately. The main goal of this dissertation is developing CNV detection methods for next-generation sequencing (NGS) data, including whole-exome sequencing (WES) data and single-cell sequencing (SCS) data.
Whole exome sequencing is the most commonly used sequencing platform in practice due to its cost and also its data size. In this dissertation, first, we introduce a novel preprocessing pipeline to improve the detection accuracy of CNVs in heterogeneous NGS data such as cancer WES data. The pipeline includes several normalization steps to reduce biases due to GC content, mappability, and tumor contamination and a denoising step to reduce noise and increase the detection power of CNV detection methods. The denoising method is based on Taut String that is an efficient implementation of the solution to the change-point optimization problem. Furthermore, we propose a novel segmentation algorithm to detect CNVs more precisely and efficiently using WES data. The proposed method employs Taut String to smooth the read depth data and to generate piecewise constant signals as CNV segments. The proposed method also filters out outlier read counts and identifies significant change points to reduce false positives. We used real and simulated data to evaluate the performance of the proposed method and compare its performance with those of other commonly used CNV detection methods. We show that the proposed segmentation method outperforms the existing CNV detection methods in terms of accuracy and false discovery rate and has a faster runtime compared to the circular binary segmentation method.
Next-generation sequencing has been successfully adapted to the sequence of complete genomes at the single-cell level. Single-cell sequencing is a useful tool to determine somatic genomic heterogeneity, however, it introduces new challenges in data analysis. General steps of a CNV detection method for SCS data are GC correction, binning, removal of outlier bins, segmentation, and removal of outlier cells. The current tools designed for bulk sequencing are not optimized for single-cell data. Therefore, utilizing advanced novel segmentation, normalization, and de-noising techniques that are explicitly designed for SCS data is necessary. In this dissertation, we present a novel CNV detection algorithm based on the modified Taut String method to detect CNVs from SCS data. The proposed method, first, finds the optimal window size for counting reads from the whole genome SCS data using the AIC approach and then removes outlier from the read count signal. Then, using the modified Taut String algorithm, the method detects CNVs and identifies significant change points. Finally, it uses the hierarchical clustering of cells based on their CNV patterns and employs z-score to improve CNV detection across the cells. We used real and simulated data to evaluate the performance of the proposed method and compare its performance with those of other commonly used CNV detection methods. Using simulated and real data, we show that the proposed method outperforms the existing CNV detection methods in terms of sensitivity and false discovery rate.