Data Mining of Gene Expression Related to Folates Data Mining of Gene Expression Related to Folates
Author: Magdalena J Wisniewski, BA
Masters thesis, The University of Texas School of Health Information Sciences at Houston.
The main objective of this study is to use data mining techniques to investigate a gene expression database which contains gene responses under folic acid supplementation. The analysis of the database was performed in order to determine the presence of CpG islands. Several types of publicly available software will be used in order to analyze of the 5' untranslated region (UTR) for CpG Islands. The first step in analyzing the genes requires retrieval of the sequences and determining the correct criteria for the sequences. The UCSC Table Browser, NCBI, and Ensembl Genome Browser were used for this step. The second step involves using EMBL-EBI: EMBOSS to determine the presence of CpG islands. CpG islands are another factor found in sequences that that may play a role in gene responses. CpG islands are unmethylated regions of the genome that are associated with the 5' ends of most house-keeping genes and many regulated genes and may play a role in governing the folate responsive genes. The gene expression database contains genes that are folic acid responsive. The genes within this database were analyzed to determine the presence of CpG islands and approximately 39% tested positive.