Linked Data and Semantics Artem Revenko Semantic Web Company

0 downloads 152 Views 2MB Size Report
Utilize expert knowledge to create thesauri and ontologies to. Compare materials. Investigate trends, topics. Assess qua
WP2: Linked Data and Semantics Artem Revenko Semantic Web Company

Introduction

Contributions

Conclusion

Introduction Contributions Collecting Data Quality Assessment Data Processing Knowledge Graph Visualizations Crowd-Sourcing Conclusion

2 / 23

Introduction

Contributions Collecting Data Quality Assessment Data Processing Knowledge Graph Visualizations Crowd-Sourcing

Conclusion

Introduction

Contributions

Conclusion

WP2 Overview

Goal I

“. . . to develop and to establish a methodology for a linked data lifecycle. “

I

“. . . information integration and aggregation of incoming sources will become more precise and can be managed in a more agile way . . . ”

4 / 23

Introduction

Contributions

Conclusion

Platform Functionalities

5 / 23

Introduction

Contributions

Conclusion

Objectives

I

Utilize expert knowledge to create thesauri and ontologies to Compare materials Investigate trends, topics Assess quality of contributions Formalize and extract user preferences

6 / 23

Introduction

Contributions

Conclusion

Objectives

I

Utilize expert knowledge to create thesauri and ontologies to Compare materials Investigate trends, topics Assess quality of contributions Formalize and extract user preferences

I

Harvest and process relevant data. Annotations enables semantic search

6 / 23

Introduction

Contributions

Conclusion

Objectives

I

Utilize expert knowledge to create thesauri and ontologies to Compare materials Investigate trends, topics Assess quality of contributions Formalize and extract user preferences

I

Harvest and process relevant data. Annotations enables semantic search

I

Use visualization tools to demonstrate the data and facilitate deeper analysis

6 / 23

Introduction

Contributions

Conclusion

Objectives

I

Utilize expert knowledge to create thesauri and ontologies to Compare materials Investigate trends, topics Assess quality of contributions Formalize and extract user preferences

I

Harvest and process relevant data. Annotations enables semantic search

I

Use visualization tools to demonstrate the data and facilitate deeper analysis

I

Use crowdsourcing/interactive tools to improve data and create collective awareness

6 / 23

Introduction

Contributions

Conclusion

Deliverables

4 D2.1 PROFIT core knowledge model (M6) 4 D2.2 Data and information streams - assessment tools (M9) 4 D2.3 Data crawlers, adaptors and extractors (M12) 4 D2.4 PROFIT Knowledge Graph (M12) 4 D2.5 Visualizations and information widgets (M18) Ü D2.6 Sharing activities and crowdsourcing mechanisms (M24)

7 / 23

Introduction

Contributions Collecting Data Quality Assessment Data Processing Knowledge Graph Visualizations Crowd-Sourcing

Conclusion

Introduction

Contributions

Conclusion

Crawling News Articles

I

UnifiedViews pipelines: scheduled harvesting

I

11 source, 30 named graphs, ≈ 250 articles weekly

I

Different methods: RSS feeds, API, permalinks

Statistics: https: //custom_apps.poolparty.biz/ProfitViz/page/statistics GraphSearch: https://profit.poolparty.biz/GraphSearch/ In platform: http://profit-demo.eea.sk:88/articles/

9 / 23

Introduction

Contributions

Conclusion

Document Assessment and Classification

Dimensions of Quality I Readability I

Relatedness to Topic

I

Sentiment

I

Extracted Concepts

Document Classification Classify documents into predefined categories. Demo: https://artem.semantic-web.at/text_assessment/

10 / 23

Introduction

Contributions

Conclusion

Topic Transitions

Topics Topics represents combinations of concepts that often and meaningfully occur together. Transitions show how different topics develop. Demo: https://artem.semantic-web.at/text_assessment/ topic_transitions

11 / 23

Introduction

Contributions

Conclusion

Bright Concepts Idea I

Sometimes something is not mentioned on purpose

I

If there is a change in context, how do we detect it?

12 / 23

Introduction

Contributions

Conclusion

Bright Concepts Idea I

Sometimes something is not mentioned on purpose

I

If there is a change in context, how do we detect it?

Example I A corpus about US politics with dates

12 / 23

Introduction

Contributions

Conclusion

Bright Concepts Idea I

Sometimes something is not mentioned on purpose

I

If there is a change in context, how do we detect it?

Example I A corpus about US politics with dates I

We create several corpora of it separated by dates

12 / 23

Introduction

Contributions

Conclusion

Bright Concepts Idea I

Sometimes something is not mentioned on purpose

I

If there is a change in context, how do we detect it?

Example I A corpus about US politics with dates I

We create several corpora of it separated by dates

I

We analyze the corpora. Findings: “White House”, “politics”, “president”, “Obama” are very often found together before 2016.

12 / 23

Introduction

Contributions

Conclusion

Bright Concepts Idea I

Sometimes something is not mentioned on purpose

I

If there is a change in context, how do we detect it?

Example I A corpus about US politics with dates I

We create several corpora of it separated by dates

I

We analyze the corpora. Findings: “White House”, “politics”, “president”, “Obama” are very often found together before 2016.

I

In the fresh (2017) corpus “Obama” disappears.

12 / 23

Introduction

Contributions

Conclusion

Bright Concepts Idea I

Sometimes something is not mentioned on purpose

I

If there is a change in context, how do we detect it?

Example I A corpus about US politics with dates I

We create several corpora of it separated by dates

I

We analyze the corpora. Findings: “White House”, “politics”, “president”, “Obama” are very often found together before 2016.

I

In the fresh (2017) corpus “Obama” disappears.

12 / 23

Introduction

Contributions

Conclusion

Bright Concepts Idea I

Sometimes something is not mentioned on purpose

I

If there is a change in context, how do we detect it?

Example I A corpus about US politics with dates I

We create several corpora of it separated by dates

I

We analyze the corpora. Findings: “White House”, “politics”, “president”, “Obama” are very often found together before 2016.

I

In the fresh (2017) corpus “Obama” disappears.

Outcome: We signal the user that in the context “White House”, “politics”, “president” the concept “Obama” was substitued by “Trump”. 12 / 23

Introduction

Contributions

Conclusion

PROFIT Knowledge Graph

Motivation Thesauri and ontologies help to I

Set the focus by defining concepts of interest

13 / 23

Introduction

Contributions

Conclusion

PROFIT Knowledge Graph

Motivation Thesauri and ontologies help to I

Set the focus by defining concepts of interest

I

Formalize expert knowledge about semantic relations between entities

13 / 23

Introduction

Contributions

Conclusion

PROFIT Knowledge Graph

Motivation Thesauri and ontologies help to I

Set the focus by defining concepts of interest

I

Formalize expert knowledge about semantic relations between entities

I

Facilitate integration and processing of information. Example: document similarities.

13 / 23

Introduction

Contributions

Conclusion

PROFIT Thesaurus

I

I

I

Reuses EuroVoc, STW; extended by experts; # Concepts 10860 # Broader/narrower 11232 27970 Statistics: # Related # Languages 25 # Labels 226255 Published: http://profit.poolparty.biz/profit_thesaurus.html

14 / 23

Document Similarity Thesaurus ECB

Mario Draghi

Document 1 Mario Draghi was not expected to announce any changes to monetary policy

Money and Markets

Monetary Policy

Interest Rate

Document 2 European Central Bank leaves interest rate at record-low 1%

Document Similarity Thesaurus ECB

Mario Draghi

Document 1 Mario Draghi was not expected to announce any changes to monetary policy

Money and Markets

Monetary Policy

Interest Rate

Document 2 European Central Bank leaves interest rate at record-low 1%

Document Similarity Thesaurus Money and Markets

ECB

Mario Draghi

Monetary Policy

Document 1

Document 2

Mario Draghi was not expected to announce any changes to monetary policy

Mario Draghi

European Central Bank leaves interest rate at record-low 1% 0.8 "

#

Monetary Policy 0.5

V

V

"

Interest Rate

ECB Interest Rate

#

Introduction

Contributions

Conclusion

User Multi-Classification

User Cycle 1. User registers and fills out some info about himself 2. User assigns himself and gets assigned to a level/category 3. User uses the platform, leaves comments, tries quizes 4. User takes educational materials and tests

16 / 23

Introduction

Contributions

Conclusion

User Multi-Classification

User Cycle 1. User registers and fills out some info about himself 2. User assigns himself and gets assigned to a level/category 3. User uses the platform, leaves comments, tries quizes 4. User takes educational materials and tests When does user comes to a “new” level?

16 / 23

Introduction

Contributions

Conclusion

User Multi-Classification

User Cycle 1. User registers and fills out some info about himself 2. User assigns himself and gets assigned to a level/category 3. User uses the platform, leaves comments, tries quizes 4. User takes educational materials and tests When does user comes to a “new” level? Taking the user’s activity in the platform into account we can automatically reassign the user based on the other users’ statistics.

16 / 23

Introduction

Contributions

Conclusion

Experiment

17 / 23

Introduction

Contributions

Conclusion

Ontologies

I

Finance Ontology: http://profit.poolparty.biz/PROFIT_Finance.html

I

User Ontology: http://profit.poolparty.biz/PROFIT-User.html

I

Initial Knowledge Graph (≈ 500 statements about 50 entities)

18 / 23

Introduction

Contributions

Conclusion

Visualization of Data Graph Goal Enable navigation in a huge data graph such that 1. No information is hidden 2. Not too much information on the page Demo: https: //custom_apps.poolparty.biz/ProfitViz/page/thesaurus I

Vertical hierarchy: more natural to scroll down

I

Hide intermediate nodes, only show how many of them exist

I

Show concepts schemes / top concepts (plural in case of poly-hierarchies)

I

Show total number of children and number of direct children of this concept, parent concept, top concept

I

Show all the concept that are in any relation to this concept 19 / 23

Introduction

Contributions

Conclusion

Numerical and Topical Trends

I

Visualize time series and trends

I

Intuitive, interactive, user-friendly

Demo: http://profit-demo.eea.sk:88/forecasting/oil/

20 / 23

Introduction

Contributions

Conclusion

Crowd-Sourcing

I

Users may suggest new elements: concepts, assign classes to concepts, add relations between concepts

I

Other users may vote on the extensions; when enough votes collected input is added to KG.

21 / 23

Introduction

Contributions Collecting Data Quality Assessment Data Processing Knowledge Graph Visualizations Crowd-Sourcing

Conclusion

Introduction

Contributions

Conclusion

Conclusion I

Linked data lifecycle established

I

Harvesting pipelines up and running, data gets integrated

I

Processing tools facilitate data consumption

23 / 23

Introduction

Contributions

Conclusion

Conclusion I

Linked data lifecycle established

I

Harvesting pipelines up and running, data gets integrated

I

Processing tools facilitate data consumption

Outcomes 1. Harvesting of news articles: 250 weekly with annotations 2. Assessment: quality, classification, topic trends 3. PROFIT thesaurus := EuroVoc + STW + expert input

you! Thank Thank you!

4. PROFIT ontologies and knoledge graph



5. Visualization projectprofit.eu of thesaurus: unique tool



6. Numerical and topical visualization tools 7. Crowd sourcing: collaborative workflows on data graph

23 / 23

Introduction

Contributions

Conclusion

Conclusion I

Linked data lifecycle established

I

Harvesting pipelines up and running, data gets integrated

I

Processing tools facilitate data consumption

Outcomes 1. Harvesting of news articles: 250 weekly with annotations 2. Assessment: quality, classification, topic trends 3. PROFIT thesaurus := EuroVoc + STW + expert input

you! Thank Thank you!

4. PROFIT ontologies and knoledge graph



5. Visualization projectprofit.eu of thesaurus: unique tool



6. Numerical and topical visualization tools 7. Crowd sourcing: collaborative workflows on data graph

23 / 23