Backward Chaining Rule Induction Using Multiple Genomic Data Types to Understand Gene Interactions in Ovarian Cancer

Author: Srinivasa C. Cherkuri, MBBS, MPH

Primary Advisor: Kim Dunn, MD, PhD

Committee Members: Noriaki Aoki, MD, PhD; Mary Edgerton, MD, PhD

Masters thesis, The University of Texas School of Biomedical Informatics at Houston.


Objective: To identify and understand mechanisms underlying low survival (high risk) patient groups in ovarian serous cystadenocarcinoma patients.

Data: 456 samples of advanced ovarian cancer mRNA data, methylation data and microRNA data from the cancer genome atlas project (TCGA). The data has been divided into training (2/3rd, 304 samples) and test set (1/3rd, 152 samples) with similar survival distribution.

Results: 13 Genes identified from supervised principal component analysis showed significant survival difference in rank normalized training, test dataset. Backward chain rule induction on two genes identified 48 new pathways starting from Fc fragment of IgG binding protein as target variable in low survival patient group.

Conclusion: This preliminary analysis helps us understand potential use of multiple genomic data types in understanding mechanisms underlying different type’s diseases and importance of data mining techniques in high throughput data analysis. Improved version of this analysis can help us narrow down our search for potential therapeutic target and genes for experimental validation in laboratory setting in ovarian cancer patients. BCRI identified genes that helped in identifying some existing pathways and new gene interactions that can be supported with validation in different test datasets.