HIGH-THROUGHPUT AUTOMATED PRIMER DESIGN

Author: Ye Yaun

Primary Advisor: Hongbin Wang, PhD (co-author)

Committee Members: David Steffen (co-author); Richard A. Gibbs (co-author); Preethi H. Gunaratne (co-author); John D. McPherson (co-author)

Masters thesis, The University of Texas School of Health Information Sciences at Houston.

The Human Genome Sequencing Center (HGSC) at Baylor College of Medicine (BCM) has collaborated with the National Institutes of Health (NIH) and four other sequencing centers in the nation on the Mammalian Gene Collection (MGC) project. A functional pipeline for the rescue of full open reading frame (ORF) cDNAs has been established and used to generate more than 570 cDNA clones from human and mouse that were suitable for the MGC. A rule-based automated PCR primer design program was developed in the first round using an iterative algorithm of primer selection whereby the sequences were subjected to three different stringent criteria. It has been proved to be able to generate PCR primer designs for the vast majority of the candidate sequences. But there was no guarantee that the best candidate would always be picked out through it. About 48% PCR assay successful rate was reported including artificial factors effects. Also, be aware that no set of guidelines can always accurately predict the success of a primer using traditional rule-based methods. Alternatively, a prototype of high-throughput automated primer design program was implemented using artificial intelligence machine learning technology to breakthrough these limitations. Simulation results showed that it outperformed the simple rule-based primer-picking program. Further development was discussed.