September 6, 2018 –
Title: Deep Pathway Analysis
Ph.D. Candidate: Yue Zhao
Major Advisor: Dr. Dong-guk Shin
Associate Advisors: Dr. Lynn Kuo, Dr. Jinbo Bi, Dr. Sheida Nabavi, Dr. Charles Giardina
Day/Time: Thursday, September 6th, 2018 11:00 am
Location: HBL Class of 1947 Conference Room
In this era of biomedical data, the current research uses genomics data (specifically, gene expression data) and compares it with the prior known gene regulation relationships, which are typically organized into curated molecular pathways, so as to gain more accurate and interpretable result. Currently, pathway analysis methodologies organize known gene-gene interaction relationships into topological pathways and analyze omics data on top of them so that the activated or suppressed state of the pathway can be computationally revealed. However, pathway analysis research is still in its infancy. A new pathway analysis framework utilizing Bayesian Networks is proposed here to pinpoint the aberrant pathway portions in cancer study.
Our proposed approach encodes each pathway route as a Bayesian Network initialized with a sequence of conditional probabilities which are designed to incorporate directionality of regulatory relationships in the pathways, i.e. activation and inhibition relationships. We demonstrate the effectiveness of our pathway analysis method by conducting two case studies. First, we validate our model through simulation in which the model is able to discern patients in the Test Group from those ones in the Control Group. Second, we apply our model to analyze the Breast Cancer data set, available from TCGA, against some pathways available from KEGG. Furthermore, we compare this approach against other pathway analysis tools and the results show that our tool can generate comparably good Area Under the Curve (AUC) in the Receiver operating characteristic curve against others.
Regarding future work, we are interested in exploring if the use of the pathway analysis may address one known problem in using machine learning techniques for genomics data mining. That is, effectively using Machine learning on gene expression data requires use of a very large number of data samples but typically the field does not have such many data sets. This problem is further aggravated if one plans to use neural networks.
The main idea is to preprocess the raw omics data with our new pathway analysis approach and obtain image-like data containing the activated pathway interactions for each given gene expression sample. Three methods will be compared: (1) Method 1 - image-like data is created through preprocessing using publicly available curated pathways (e.g., KEGG), (2) Method 2 - image-like data is created through preprocessing with fake decoy pathways, (3) Method 3 - no preprocessing is done and using raw expression levels directly. As a preliminary step, computational experiments were performed with two real data sets, one for single cell breast cancer study and one for cell cycling microarray data set, were performed and results are promising in both cases. The outcome using Method 1 provided higher AUC for the precision recall curve than the outcomes using Method 2 and Method 3. This preliminary result suggests that our proposed approach could be useful in handling the small data set problem in gene expression data study.