January 31, 2020 –
Title: Developing Computational Methods for Single-cell RNA Sequencing Data Analysis
Student: Tianyu Wang
Major Advisor: Prof. Sheida Nabavi
Associate Advisors: Prof. Ion Mandoiu and Prof. Derek Aguiar
Date/Time: Friday, Jan 31, 2020, 4:00pm
Location: ITE 336
Single-cell RNA sequencing (scRNAseq) enables to uncover the cell-specific changes in transcriptome which are missed by bulk sequencing. This emerging and fast-growing sequencing technology has had a major impact on several fields, including microbiology, neurobiology, immunology, developmental biology, cancer, and stem cell. With rapid advances in single-cell technologies, the data continues to grow in size and complexity. The scalable, flexible, and robust computational methods for analyzing these data becomes an urgent need. Due to low amount of RNAs from a single cell and low capture efficiency of sequencing technologies, single-cell sequencing posts new challenges in data analysis including data multimodality, extensive noise, and dropout (missing data). To address the new challenges of single-cell data analysis, novel computational methods need to be developed.
In this study, we developed the new methods that employ non-parametric approaches and incorporate prior biological knowledge about gene interactions for three main applications in scRNAseq data: 1-Differential gene expression analysis, 2-cell clustering analysis, and 3-imputation for drop-out zero expressions. For differential gene expression analysis in the multimodal scRNAseq data, we developed a new method based on the Earth Mover’s Distance (EMD). The EMD is a non-parametric method to measure the distance between two distributions by solving the transportation optimization problem. For cell clustering analysis, we developed a new distance metric for pairs of cells that synthesizes the prior knowledge of gene relationship and gene expression values. We used a work embedding approach to obtain vector representations for genes and used the vectors to compute the distance between genes. For estimating the zero expression values in scRNAseq data, we employed a method based on the low rank matrix completion with side information approach that taking advantage of gene associations as prior knowledge for better imputation.