Combining Natural Language Processing and ... - Scholar Commons

Sep 30, 2011 - Table 17 Top 5 Predictive Terms from Text Mining TEXT and CUIS . ...... intelligence analysis in the context of military, police, and business ...
3MB Sizes 1 Downloads 318 Views
University of South Florida

Scholar Commons Graduate Theses and Dissertations

Graduate School

2011

Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized Versus Common Languages Jay Jarman University of South Florida, [email protected]

Follow this and additional works at: http://scholarcommons.usf.edu/etd Part of the American Studies Commons, and the Databases and Information Systems Commons Scholar Commons Citation Jarman, Jay, "Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized Versus Common Languages" (2011). Graduate Theses and Dissertations. http://scholarcommons.usf.edu/etd/3166

This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]

Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized Versus Common Languages

by

Jay Jarman

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Information Systems and Decision Sciences College of Business University of South Florida

Major Professor: Donald J. Berndt, Ph.D. Stephen L. Luther, Ph.D. Balaji Padmanabhan, Ph.D. Rosann W. Collins, Ph.D.

Date of Approval: September 30, 2011

Keywords: data mining, machine learning, computational linguistics, decision tree, rule mining c 2011, Jay Jarman Copyright ⃝

Dedication

I would like to dedicate this work to my wife, Cari Jarman. I don’t know how anyone could do this without a support system. She is the most important person in my support system and has been there for me while I followed my dream. I am eternally grateful to her and will never be able to express my gratitude. Cari, I appreciate everything you do and I love you with all my heart.

Acknowledgments

Funding for this research came from the Consortium for Healthcare Informatics Research, [HIR09-002] HSR&D Center of Excellence, James A. Haley Veterans Hospital, Tampa, FL. This dissertation presents the findings and conclusions of the authors and does not necessarily represent the Department of Veterans Affairs (VA). I would also like to acknowledge both the VA and specifically Steve Luther. I was given an opportunity to be on a research team and work with some of the best informaticians in this country and for that I will always be grateful. I would like to acknowledge the help and collaboration of James McCart and Varol Kayhan. Finally, I would like to acknowledge Don Berndt, my chair and mentor. When he came to me and asked if I’d like to do some mining research with the VA I had no idea that it would turn into a career and that I would gain so many new friends and collaborators. I’m truly appreciative of this opportunity.

Table of Contents

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Why the medical domain? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Research Question One - Specialized Language vs Common Language 1.2.2 Research Question Two - Rule-based Classifiers . . . . . . . . . . . . Approach One - Classification Rules . . . . . . . . . . . . . . . . . . . Approach Two - Decision Tree Induction (