Hua Xu, PhD

Robert H. Graham Associate Professor

Xua Hu, PhDDr. Hua Xu is an associate professor at the School of Biomedical Informatics in The University of Texas Health Science Center at Houston (UTHealth). He directs the Center for Computational Biomedicine at UTHealth. Currently he is the Chair of American Medical Informatics Association (AMIA) Natural Language Processing (NLP) working group. Dr. Xu received his Ph.D. in Biomedical Informatics from Columbian University in 2008. In addition, he holds a B.S. degree in Biochemistry from Nanjing University in China and an M.S. in Computer Science from New Jersey Institute of Technology. Dr. Xu is an expert in biomedical text processing and data mining. His primary research interests include: 1) natural language processing of clinical text; 2) text mining of biomedical literature; and 3) healthcare data mining. He is the author of many publications on biomedical NLP and text mining, and his research on medication extraction received the Homer Warner Award from AMIA in 2009. Dr. Xu has been principal investigator on a number of grants, including R01s from National Library of Medicine (NLM) and National Cancer Institute (NCI).

curriculum vitae

School of Biomedical Informatics
7000 Fannin St., Suite 870, Houston, TX 77030  
Phone: 713-500-3924


  • Ph.D. in Biomedical Informatics, 2008, Columbia University, New York, NY
  • M.Phil in Biomedical Informatics, 2007, Columbia University, New York, NY
  • M.S. in Computer Science, 2001, New Jersey Institute of Technology, Newark, NJ
  • B.S. in Biochemistry, 1998, Nanjing University, Nanjing, P.R. China

Research Areas

NLP Methods:

  • named entity recognition (NER)
  • abbreviation detection and disambiguation
  • syntactic/semantic parsing
  • active learning
  • temporal information/relation extraction and modeling 

NLP Systems:

  • Medication information extraction – MedEx
  • Development of comprehensive clinical NLP systems

NLP and data mining applications:

  • EMR-based epidemiological studies of cancers
  • Informatics approaches to Pharmacogenomics
  • Drug-ADE detection (pharmacovigilance) from EHR
  • Literature mining of genes and environmental factors


Current Grants:

Repurposing Existing Drugs for Cancer Treatment using Electronic Health Records
CPRIT (Cancer Prevention & Research Institute of Texas), Rising Star Award (PI – Hua Xu)
03/01/2013 – 02/28/2017

Natural language processing for clinical and translational research
NIGMS 1R01GM102282 (MPI – Hongfang Liu, Serguel Pakhomov, and Hua Xu)
04/01/2013 – 03/31/2017

Completed Grants:

An In-silico Method for Epidemiological Studies Using Electronic Medical Records
NCI R01CA141307 (PI – Hua Xu)
09/03/2009 – 07/31/2013

Real-time Disambiguation of Abbreviations in Clinical Notes
NLM R01LM010681 (PI – Hua Xu)
05/31/2010 – 5/30/2013

An Informatics-based Approach to Pharmacogenetic Studies of Warfarin
NIH UL1 RR024975-KL2 Scholar Award (PI – Hua Xu)
07/01/2009 – 06/30/2010


Peer Reviewed Articles - Journal:

  1. Fan JW, Yang EW, Jiang M, Prasad R, Loomis RM, Zisook DS, Denny JC, Xu H,­­­­ and Huang Yang. Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences. J Am Med Inform Assoc. 2013, In Press.
  2. Chen Y, Carroll RJ, Shah A, Eyler AE, Denny JC, Xu H. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. J Am Med Inform Assoc. 2013, In Press.
  3. Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. A hybrid system for temporal information extraction from clinical text. J Am Med Inform Assoc. 2013, In Press.
  4. McCoy AB, Wright A, Eysenbach G, Malin BA, Patterson ES, Xu H, Sittig DF. State of the art in clinical informatics: evidence and examples. IMIA Yearbook of Medical Informatics. 2013, In Press.
  5. Mani S, Chen Y, Li X, Arlinghaus L, Chakravarthy AB, Abramson V, Bhave SR, Levy MA, Xu H, Yankeelov TE. Machine learning for predicting the response of breast cancer to neoadjuvant chemotherapy. J Am Med Inform Assoc. 2013, In Press.
  6. Wei W, Cronin RM, Xu H, Lasko TA, Bastarache L, Denny JC. Development and evaluation of an ensemble resource linking medications to their indications. J Am Med Inform Assoc. 2013, In Press.
  7. Tang B, Cao H, Wu Y, Jiang M, Xu H. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. BMC Medical Informatics and Decision Making 2013, 13(Suppl 1):S1
  8. Chen Y, Cao H, Mei Q, Zheng K, Xu H. Applying Active Learning to Supervised Word Sense Disambiguation in MEDLINE. J Am Med Inform Assoc. 2013, In Press. [PMID: 23364851]
  9. Wiley LK, Shah A, Xu H, Bush WS. ICD-9 Tobacco Use Codes are Effective Identifiers of Smoking Status. J Am Med Inform Assoc. 2013, In Press. [PMID: 23396545]
  10. Liu M, McPeek Hinz ER, Matheny ME, Denny JC, Schildcrout JS, Miller RA, Xu H. Comparative Analysis of Pharmacovigilance Methods in Detection of Adverse Drug Reactions from Electronic Medical Records. J Am Med Inform Assoc. 2012, In Press. [PMID: 23161894]
  11. Xu H, Wu Y, Elhadad N, Stetson PD, Friedman C. A new clustering method for detecting rare sense of abbreviations in clinical notes. J Biomed Inform. 2012, 45(6):1075-83. [PMID: 22742938]
  12. Liu M, Wu Y, Chen Y, Sun J, Zhao Z, Chen X, and Xu H. Large-scale prediction of adverse drug reaction by integrating chemical, biological, and phenotypic properties of drugs. J Am Med Inform Assoc. 2012. 19(e1): e28-e35. [PMCID:PMC3392844]
  13. Denny JC, Schildcrout JS, Bowton EA, Gregg W, Pulley JM, Basford MA, Cowan J, Xu H, Ramirez AH, Crawford DC, Ritchie MD, Peterson JF, Masys DR, Wilke RA, Roden DM. Optimizing drug outcomes through pharmacogenetics: A case for preemptive genotyping. Clin Pharmacol Ther. 2012. 92(2):235-42. [PMID: 22739144]
  14. Wu Y, Levy MA, Micheel CM, Yeh P, Tang B, Cantrell MJ, Cooreman SM, Xu H. Identifying the status of genetic lesions in cancer clinical trial documents using machine learning. BMC Genomics. 2012, 13 Suppl 8:S21 [PMCID: PMC3535695]
  15. Han B, Chen XW, Talebizadeh Z, Xu H. Genetic studies of complex human diseases: characterizing SNP-disease associations using Bayesian networks. BMC Syst Biol. 2012. 6 Suppl 3: S14 [PMCID: PMC3524021]
  16. Lu Y, Xu H, Peterson NB, Dai Q, Jiang M, Denny JC, Liu M. Extracting epidemiological exposure and outcome terms from literature using machine learning approaches. Int J Data Min Bioinform. 2012; 6(4):447-59. [PMID: 23155773]
  17. Roden DM, Xu H, Denny JC, Wilke RA. Electronic Medical Records as a Tool in Clinical Pharmacology: Opportunities and Challenges. Clin Pharmacol Ther. 2012, Apr 25. [PMID:22534870]
  18. Delaney JT, Ramirez AH, Bowton EA, Pulley JM, Basford MA, Schildcrout JS, Shi Y, Zink R, Oetjens M, Xu H, Cleator JH, Jahangir E, Ritchie MD, Masys DR, Roden DM, Crawford DC, Denny JC. Predicting clopidogrel response using DNA samples linked to an electronic health record. Clin Pharmacol Ther. 2012 Feb;91(2):257-63. [PMID: 22190063]
  19. Birdwell KA, Grady B, Choi L, Xu H, Bian A, Denny JC, Jiang M, Vranic G, Basford M, Cowan JD, Richardson DM, Robinson MP, Ikizler TA, Ritchie MD, Stein CM, Haas DW. The use of a DNA biobank linked to electronic medical records to characterize pharmacogenomic predictors of tacrolimus dose requirement in kidney transplant recipients. Pharmacogenet Genomics. 2012 22(1):32-42. [PMCID: PMC3237759]
  20. Ramirez AH, Shi Y, Schildcrout JS, Delaney JT, Xu H, Oetjens MT, Zuvich RL, Basford MA, Bowton EA, Jiang M, Speltz P, Zink R, Cowan J, Pulley JM, Ritchie MD, Masys DR, Roden DM, Crawford DC, Denny JC. Predicting warfarin dosage in European and African Americans using DNA samples linked to an electronic health record. Pharmacogenomics. 2012, 13(4):407-18. [PMCID: PMC3361510]
  21. Sun J, Xu H, Zhao Z Network-assisted investigation of antipsychotic drugs and their targets. Chem Biodivers. 2012, 9(5): 900-10. [PMID:22589091]
  22. Sun J, Wu Y, Xu H, Zhao Z. DTome: a web-based tool for drug-target interactome construction, BMC Bioinformatics. 2012, 13(Suppl 9): 57.
  23. Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA, Boomershine CS, Lasko TA, Xu H, Karlson EW, Perez RG, Gainer VS, Murphy SN, Ruderman EM, Pope RM, Plenge RM, Kho AN, Liao KP, Denny JC. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc. 2012 Feb 28. [PMID: 22374935]
  24. Doan S, Collier N, Xu H, Pham HD, and Tu MP. Recognition of medication information from discharge summaries using ensembles of classifiers. BMC Medical Informatics and Decision Making. 2012, 12(1):36. [PMID: 22564405]
  25. Chen Y, Mani S, Xu H. Applying active learning to assertion classification of concepts in clinical text. J Biomed Inform 2012, 45(2): 265-272. [PMCID: PMC3306548]
  26. Wilke RA, Xu H, Denny JC, Roden DM, Krauss RM, McCarty CA, Davis RL, Skaar T, Lamba J, and Savova G. The emerging role of electronic medical records in pharmacogenomics. Clin Pharmacol Ther. 2011, 89(3): 379-86. [PMCID: PMC3204342]
  27. Rosenbloom ST, Denny JC, Xu H, Lorenzi N, Stead WW, Johnson KB. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc. 2011, 18(2):181-6. [PMCID: PMC3116264]
  28. Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc. 2011, 18(5):601-6. [PMCID: PMC3168315]
  29. Xu H, Jiang M, Oetjens M, Bowton EA, Ramirez AH, Jeff JM, Basford MA, Pulley JM, Cowan JD, Wang X, Ritchie MD, Masys DR, Roden DM, Crawford DC, Denny JC. Facilitating pharmacogenetic studies using electronic health records and natural language processing: a case study of warfarin. J Am Med Inform Assoc. 2011; 18(4): 387-91. [PMCID: PMC3128409]
  30. Xu H, AbdelRahman S, Lu Y, Denny JC, Doan S. Applying semantic-based probabilistic context free grammar to medical language processing – a preliminary study on parsing medication sentences. J Biomed Inform 2011, 44(6): 1068-75. [PMCID: PMC3226929]
  31. Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx – A Medication Information Extraction System for Clinical Narratives. J Am Med Inform Assoc. 2010; 17(1):19-24. [PMCID: PMC2995636]
  32. Denny JC, Peterson JF, Choma NN, Xu H, Miller RA, Bastarache L, Peterson NB. Development of a Natural Language Processing System to Identify Timing and Status of Colonoscopy Testing in Electronic Medical Records. J Am Med Inform Assoc. 2010; 17(4): 393-8. [PMCID: PMC2815478]
  33. Doan S, Bastarache L, Klimkowski S, Denny JC, Xu H. Integrating Existing NLP Tools for Medication Extraction from Discharge Summaries. J Am Med Inform Assoc. 2010, 17:528-31. [PMCID: PMC2995674]
  34. Xu H, Stetson P, Friedman C. Methods for Building Sense Inventories of Abbreviations in Clinical Notes. J Am Med Inform Assoc. 2009 16(1):103-108. [PMCID: PMC2605589]
  35. Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated Acquisition of Disease-Drug Knowledge from Biomedical and Clinical Documents. J Am Med Inform Assoc. 2008, 15(1):87-98. [PMCID: PMC2274872]
  36. Tulipano KP, Tao Y, Millar WS, Zanzonico P, Kolbert K, Xu H, Yu H, Chen L, Lussier YA, Friedman C. Natural language processing and visualization in the molecular imaging domain. J Biomed Inform. 2007; 40:3, 270-281. [PMID: 17084109]
  37. Fan JW, Xu H, Friedman C. Using Contextual and lexical features to restructure and validate the classification of biomedical concepts. BMC Bioinformatics. 2007; 8: 264. [PMCID: PMC2014782]
  38. Xu H, Fan JW, Hripcsak G, Mendonça EA, Markatou M, Friedman C. Gene symbol disambiguation using knowledge-based profiles. Bioinformatics, 2007 23(8):1015-1022. [PMID: 17314123]
  39. Xu H, Markatou M, Dimova R, Liu H, Friedman C. Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinformatics. 2006; 7:334. [PMCID: PMC1550263]
  40. Lee HT, Krichevsky IE, Xu H, Ota-Setlik A, D'Agati VD, Emala CW. Local anesthetics worsen renal function after ischemia-reperfusion injury in rats. Am J Physiol Renal Physiol. 2004; 286(1):F111-9. [PMID: 14519592]
  41. Lee HT, Xu H, Nasr SH, Schnermann J, Emala CW. A1 adenosine receptor knockout mice exhibit increased renal injury following ischemia and reperfusion. Am J Physiol Renal Physiol. 2004; 286(2):F298-306. [PMID: 14600029]
  42. Lee HT, Xu H, Ota-Setlik A, Emala CW. Oxidant preconditioning protects human proximal tubular cells against lethal oxidant injury via p38 MAPK and heme oxygenase-1. Am J Nephrol. 2003; 23(5):324-33. [PMID: 12915776]
  43. Lee HT, Ota-Setlik A, Xu H, D'Agati VD, Jacobson MA, Emala CW. A3 adenosine receptor knockout mice are protected against ischemia- and myoglobinuria-induced renal failure. Am J Physiol Renal Physiol. 2003; 284(2):F267-73. [PMID: 12388399]
  44. Lee HT, Xu H, Siegel CD, Krichevsky IE. Local anesthetics induce human renal cell apoptosis. Am J Nephrol. 2003; 23(3):129-39. [PMID: 12586958]

Peer Reviewed Articles - Conference:

  1. Moon S, Berster BT, Xu H, Cohen T. Word sense disambiguation of clinical abbreviations with hyperdimensional computing. AMIA Annu Symp Proc. 2013. Accepted.
  2. Wu Y, Lei J, Wei W, Tang B, Denny JC, Rosenbloom ST, Miller RA, Giuse DA, Zhang K, Xu H. Analyzing differences between Chinese and English clinical text: a cross-institution comparison of discharge summaries in two languages. MedInfo, 2013, Copenhagen, Demark. Accepted.
  3. Sun S., Zhou X., Denny JC, Rosenbloom CT, Xu H. Messaging to Your Doctors: Understanding Patient-Provider Communications via a Portal System. Proceedings of ACM Conference on Human Factors in Computing Systems (SIG’CHI), 2013, Paris, France. In Press.
  4. Tang B, Can H, Wu Y, Jiang M, Xu H. Clinical Entity Recognition using Structural Support Vector Machines with Rich Features. ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO), 2012, In Press.
  5. Xu H, Stetson PD, Friedman C. Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. AMIA Annu Symp Proc. 2012. 1004-13.
  6. Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse DA, Xu H. A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries. AMIA Annu Symp Proc. 2012. 997-1003.
  7. Liu M, Shah A, Min J, Peterson NB, Dai Q, Aldrich MC, Chen Q, Bowton EA, Liu H, Denny JC, Xu H. A study of transportability of an existing smoking status detection module across institutions. AMIA Annu Symp Proc. 2012. 577-86.
  8. Jiang M, Denny JC, Tang B, Cao H, Xu H. Extracting semantic lexicons from discharge summaries using machine learning and c-value method. AMIA Annu Symp Proc. 2012. 409-16.
  9. Wu Y, Liu M, Zheng W, Zhao Z, Xu H. Ranking gene-drug relationships in biomedical literature using latent dirichlet allocation. Pac Symp Biocomput. 2012: 422-33. [PMID: 22174297]
  10. Xu H, Fu Z, Shah A, Chen Y, Peterson NB, Chen Q, Mani S, Levy MA, Dai Q,Denny  JC. Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. AMIA Annu Symp Proc. 2011, 1564-72. [PMCID: PMC3244156]
  11. Liu M, Kawai VK, Stein CM, Denny JC, Roden DM, Xu H. Modeling drug exposure data in electronic medical records: an application to warfarin. AMIA Annu Symp Proc. 2011, 815-23. [PMCID: PMC3243123]
  12. Wu Y, Rosenbloom ST, Denny JC, Miller RA, Mani S, Giuse DA, Xu H. Detecting abbreviations in discharge summaries using machine learning methods. AMIA Annu Symp Proc. 2011, 1541-9. [PMCID: PMC3243185]
  13. Xu H, AbdelRahman S, Jiang M, Fan JW, Huang Y. An Initial Study of Full Parsing of Clinical Text using the Stanford Parser. International Workshop on Biomedical and Health Informatics, IEEE Conference of Bioinformatics and Biomedicine (BIBM), 2011.
  14. Xu H, Doan S, Birdwell KA, Cowan JD, Vincz AJ, Haas DW, Basford MA, Denny JC. An automated approach to calculating the daily dose of tacrolimus in electronic health records. AMIA Summits Transl Sci Proc. 2010:71-5. [PMCID: PMC3041548]
  15. Denny JC, Speltz P, Maddox R, Stein G, Xu H, Spickard A. Comparing Content Coverage in Medical Curriculum to Trainee-Authored Clinical Notes. AMIA Annu Symp Proc. 2010, 157-161. [PMCID: PMC3041398]
  16. Doan S and Xu H. Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine. COLING 2010, the 23rd International Conference on Computational Linguistics, 259-266.
  17. Xu H, Lu Y, Jiang M, Liu M, Denny JC, Dai Q, Peterson NB. Mining Biomedical Literature for Terms related to Epidemiologic Exposures. AMIA Annu Symp Proc. 2010, 897-901. [PMCID: PMC3041399]
  18. Fan JW, Xu H, Friedman C. Using Distributional Analysis to Semantically Classify UMLS Concepts. In Proceedings of Medinfo. 2007; 519-23. [PMID: 17911771]
  19. Xu H, Fan JW, Friedman C. Combine multiple evidence for gene symbol disambiguation. ACL 2007, BioNLP Workshop, p41-48.
  20. Xu H, Stetson P, Friedman C. A Study of Abbreviations in Clinical Notes. AMIA Annu Symp Proc. 2007; 821-5.
  21. Xu, H, Anderson, K, Grann, V, Friedman, C. Facilitating Cancer Research using Natural Language Processing of Pathology Reports. In Proceedings of Medinfo. 2004; 565-72. [PMID: 15360876]