Lecture notes - Cambridge Computer Laboratory

Sep 17, 2015 - e.g., cat [...dog 0.8, eat 0.7, joke 0.01, mansion 0.2, zebra 0.1...], where 'dog', 'joke' and so on are the dimensions. Context: Different models ...
831KB Sizes 1 Downloads 459 Views
Natural Language Processing: Part II Overview of Natural Language Processing (L90): Part III/ACS 2015, 12 Lectures, Michaelmas Term September 17, 2015 Ekaterina Shutova ([email protected]) http://www.cl.cam.ac.uk/users/es407/ c Ann Copestake, 2003–2015 Based on the lecture notes originally written by Ann Copestake. Copyright 2015-16 version updated by E. Shutova

Lecture Synopsis Aims This course introduces the fundamental techniques of natural language processing. It aims to explain the potential and the main limitations of these techniques. Some current research issues are introduced and some current and potential applications discussed and evaluated. 1. Introduction. Brief history of NLP research, current applications, components of NLP systems. 2. Finite-state techniques. Inflectional and derivational morphology, finite-state automata in NLP, finite-state transducers. 3. Prediction and part-of-speech tagging. Corpora, simple N-grams, word prediction, stochastic tagging, evaluating system performance. 4. Context-free grammars and parsing. Generative grammar, context-free grammars, parsing with context-free grammars, weights and probabilities. Limitations of context-free grammars. 5. Constraint-based grammars. Constraint-based grammar, unification. 6. Compositional semantics. Compositional semantics with lambda calculus. Simple compositional semantics in constraint-based grammar. Dependency structures. Inference and robust entailment. 7. Lexical semantics. Semantic relations, WordNet, word senses, word sense disambiguation. 8. Distributional semantics. Representing lexical meaning with distributions. Similarity metrics. Clustering. 9. Discourse and dialogue. Anaphora resolution, discourse relations. 10. Language generation. Generation and regeneration. Components of a generation system. 11. Computational psycholinguistics. Modelling human language use. 12. Recent trends and applications. Recent trends in NLP research and examples of practical applications of NLP techniques. Objectives At the end of the course students should • be able to discuss the current and likely future performance of several NLP applications; • be able to describe briefly a fundamental technique for processing language for several subtasks, such as morphological processing, parsing, word sense disambiguation etc.; • understand how these techniques draw on and relate to other areas of computer science. 1

Overview NLP is a large and multidisciplinary field, so this course can only provide a very general introduction. The idea is that this is a ‘taster’ course that gives an idea of the different subfields and shows a few of the huge range of computational techniques that are used. The first lecture is designed to give an overview including a very brief idea of the main applications and the methodologies which have been employed. The history of NLP is briefly discussed as a way of putting this into perspective. The next nine lectures describe some of the main subdisciplines in more detail. The organisation is mainly based on increased ‘depth’ of processing, starting with relatively surface-oriented techniques and progressing to considering meaning of sentences and meaning of utterances in context. Most lectures will start off by considering the subarea as a whole and then go on to describe one or more sample algorithms which tackle particular problems. The algorithms have been chosen because they are relatively straightforward to describe and because they illustrate a specific technique which has been shown to be useful, but the idea is to exemplify an approach, not to give a detailed survey (which would be impossible in the time available). Lectures 2-9 are primarily about analysing language: lecture 10 discusses generation. Lecture 11 introduces some issues in computational psycholinguistics. The final lecture is intended to give further context: it will include discussion of one or more NLP systems. The material in Lectures 11 and 12 will not be directly examined. Slides for Lectures 11 and 12 will be made available via the