GSBS logo
 
Degrees of Discovery logo
 
 

GS11 1053   Data Mining Methodology

Rodin, Andrei. Three semester hours. Summer annually. Prerequisites: Introductory statistics, genetics, basic math and algebra skills.

In this course we will cover application of various novel data mining, machine learning and artificial intelligence methods to the data analysis of large genetic epidemiology datasets. The emphasis will be on the data analysis in wide-scale (genomic, or genome-wide) association studies of complex diseases (such as CVD, or cardiovascular disease), where large numbers of small effects present numerous problems to the traditional statistical methodology. Among other methods, feature construction and feature set reduction, classification, clustering and dependency modeling will be detailed. For comparison purposes, we will also briefly cover (1) applications of the same novel methodology in different but related fields (such as gene expression studies), and (2) more traditional approaches to genetic epidemiology data analysis (such as multiple testing corrections).