GSBS logo
 
Degrees of Discovery logo
 
 

GS11 1053:  Data Mining Methodology

Yamal, Jose-Miguel. Three semester hours.  Spring annually.  Prerequisites: Introductory Statistics, Genetics, and Basic Math and Algebra skils.

This course will cover applications of various novel data mining, machine learning and artificial intelligence methods to the data analysis of large genetic epidemiology datasets. The emphasis will be on the data analysis in wide-scale (genomic, or genome--wide) associaton studies of complex diseases (such as CVD, or cardiovascular disease), where large number of small effects present numerous problems to the traditional statistical methodology.  Among other methods, feature construction and feature set reduction, classification, clustering and dependency modeling will be detailed.  For comparison purpposes, this course will also briefly cover (1) applications of the same novel methodology in different but related fields such as gene expression studties), and (2) more traditional approaches to genetic epidemiology data analysis (such as multiple testing corrections).