Information Extraction: Techniques, Advances and ... - Semantic Scholar

Jun 12, 2012 - IE can build a data base with the information on a given relation or .... The top systems obtained mention values in the range of 70-85, ...... 59% of the missing errors were due to text, query or answer translation errors; 20% ...
4MB Sizes 1 Downloads 311 Views
Information Extraction: Techniques, Advances and Challenges

Heng Ji Computer Science Department and Linguistics Department Queens College and Graduate Center City Univeristy of New York [email protected] June 12, 2012

Outline   

Introduction Basic Information Extraction (IE) Advanced IE  



Popular Research Directions  



Enhance Quality Enhance Portability Cross-source IE IE for Noisy Data

Resources

Introduction   

What is IE Why IE is Useful IE History

What is IE 

(In this talk) Information Extraction (IE) =Identifying the instances of facts names/entities , relations and events from semi-structured or unstructured text; and convert them into structured representations (e.g. databases)

Vivendi Universal Entertainment BarryDiller Diller on Wednesday quit as chief of Vivendi Universal Entertainment. Barry

Trigger Arguments

Quit (a “Personnel/End-Position” event) Role = Person

Barry Diller

Role = Organization

Vivendi Universal Entertainment

Role = Position

Chief

Role = Time-within

Wednesday (2003-03-04)

Why IE is Useful 

IE can build a data base with the information on a given relation or event from news, financial, bio-medical domains…       



Attack/arrest events People’s jobs People’s whereabouts Merger and acquisition activity Disease outbreaks Patient records Experiment chains in scientific papers

Component technology for other areas       

Question Answering (QA) Summarization Automatic translation Document indexing Structured Search: “who are the top employees of IBM from 2002-2012?” Opinion Mining/Sentiment Extraction Text Data Mining over Extracted Relationships

Application Example: Dynamic Event Tracking

(Chen and Ji, 2009)

http://nlp.cs.qc.cuny.edu/demo/personvisual.html

IE for Scientific Literature For sequestration, the CO2 captured from a fossil fuel plant is first compressed until the combined heat and pressure make it "supercritical" — a state in which it displays both gas and fluid properties. At 3 kilometers, you needed only 10 wells because the increased temperature lowered the viscosity of the CO2, allowed it to slide more easily into the reservoir. Supercritical CO2 is buoyant and will rise above the other fluids. If it rises high enough (above a depth of 2,600 feet), it will return to a gaseous state.

Centroid=“CO2 Event Geological Object Sequestration” Place

capture

Event

compress

Event

rise

CO2

Object

CO2

Object

CO2

fossil fuel plant

State

supercritica l

State

gaseous

Depth

2,600 feet

(subsequence) Event

lower

Event

slide

Object

CO2

Object

CO2

Agent

increased temperature viscosity

Place

reservoir

Volume

10 wells

Depth

3 kilometers

Target

(causal relation)

Real Application: Terrorism Networks Extraction

Demo URL: http://blender2.cs.qc.cuny.edu/BlenderGraph Demo Video: http://nlp.cs.qc.cuny.edu/terrorism.m4v

IE History: Early Projects 

Knowledge-based, rule-based 

FRUMP – 1979 



Newswire

LSP (Language String Project) – 1981   

Naomi Sager et al. AMA – American Medical Association Patient summaries

IE History: MUC 

MUC – Message Understanding Conferences (