Social curation as corpora for large- scale ... - Semantic Scholar

0 downloads 167 Views 8MB Size Report
Cloud, Hadoop, NoSQL. 800 kimtea. NoSQL, Google AppEngine. 2,742 repeatedly. D, Ruby, Python, Opera. 4,081 siena. Web te
Social curation as corpora for largescale multimedia content analysis Akisato Kimura NTT Communication Science Laboratories [email protected] @_akisato Since 2013-02-20 / Copyright 2013 NTT Communication Science Laboratories. All Rights Reserved.

Acknowledgment • I really appreciate the following collaborators for their great commitments for this work.

Prof. Kevin Duh @ NAIST

Dr. Katsuhiko Ishiguro (石黒 勝彦)

Mr. Koh Takeuchi (竹内 孝)

Social curation as corpora for large-scale multimdia content analysis

Ms. Kaori Kataoka (片岡 香織) 2

Summary • Social curation has a great potential for analyzing large-scale multimedia contents.

1. Curated contents share a specific context, unlike common social media contents.

2. Quite a large amount of collection is freely available. Social curation as corpora for large-scale multimdia content analysis

3

Agenda 1. What is “Social Curation”? 2. Why is “Social Curation” emerging? 3. How does “Social Curation” work? 4. What is the next?

Social curation as corpora for large-scale multimdia content analysis

4

What is “Social Curation”?

Since 2013-02-20 / Copyright 2013 NTT Communication Science Laboratories. All Rights Reserved.

What is “curation”?

(From Wikipedia) • Traditionally, a curator of a cultural heritage institution (e.g., gallery, museum, library or archive) is a content specialist responsible for an institution's collections and involved with the interpretation of heritage material. Social curation as corpora for large-scale multimdia content analysis

6

Standard flow of social media Creators generate and upload contents to social services

Social curation as corpora for large-scale multimdia content analysis

Consumers enjoy created contents

7

1. What is “Social Curation”?

Process of social curation Creators generate and

Curator picks up

upload contents to social services

contents and summarizes in a list

Social curation as corpora for large-scale multimdia content analysis

Consumers enjoy summarized lists

8

1. What is “Social Curation”?

Curators : the significance Curator picks up contents and summarizes in a list

Curator makes significant differences from existing social media contents. Social curation as corpora for large-scale multimdia content analysis

9

Statistics “Pew Internet : Social Networking (full detail)“ – 46% of adult internet users post original photos or videos online that they themselves have created. We call them creators. – 41% of adult internet users take photos or videos that they have found online and repost them on sites designed for sharing images with many people. We call them



curators.

Overall, 56% of internet users do at least one of the creating or curating

32% of internet users do both creating and curating activities. activities we studied and

(Cf. http://pewinternet.org/Commentary/2012/March/Pew-Internet-Social-Networking-full-detail.aspx)

Social curation as corpora for large-scale multimdia content analysis

10

1. What is “Social Curation”?

Example 1 (in Japan)

• Specialized to Twitter messages • One of the most popular curation services in Japan

(cf. http://matome.naver.jp/odai/2134491562701539701)

Social curation as corpora for large-scale multimdia content analysis

11

1. What is “Social Curation”?

Example 2 (in Japan)

• One of the most popular curation services in Japan • Can deal with various multimedia contents (microblogs, news, images, music, videos…) • Provide incentive compensation plans for curators (cf. http://matome.naver.jp/odai/2134491562701539701)

Social curation as corpora for large-scale multimdia content analysis

12

1. What is “Social Curation”?

Example 3

• Quite similar to Naver Matome, but more sophisticated • Can deal with various multimedia contents • One of the most popular curation services in the world (cf. http://storify.com/weatherchannel/meteor-slams-earth-injuring-hundreds)

Social curation as corpora for large-scale multimdia content analysis

13

1. What is “Social Curation”?

Example 4 • Specialized to images • One of the most emerging curation services in the world

(cf. http://pinterest.com/akisatoo/)

Social curation as corpora for large-scale multimdia content analysis

14

1. What is “Social Curation”?

Example 5

(cf. http://www.pearltrees.com/#/N-u=1_334641&N-p=23176992&N-s=1_3109322&N-fa=3109322&N-f=1_3109322)

Social curation as corpora for large-scale multimdia content analysis

15

1. What is “Social Curation”?

Lots of curation services available

(Cf. http://www.youbrandinc.com/ultimate-lists/ultimate-list-content-curation-tools-platform/)

Social curation as corpora for large-scale multimdia content analysis

16

1. What is “Social Curation”?

Social Curation Summit

(Cf. http://www.mediabistro.com/socialcurationsummit/)

Social curation as corpora for large-scale multimdia content analysis

17

Why is “Social Curation” emerging?

Since 2013-02-20 / Copyright 2013 NTT Communication Science Laboratories. All Rights Reserved.

Win-win-win relationships [Curators] • Convenient as a scrap board • Useful for self-branding and (stealth) marketing

Social curation as corpora for large-scale multimdia content analysis

19

Win-win-win relationships [Consumers] • Avoid being drown in a sea of social media • Enjoy more focused stories

Social curation as corpora for large-scale multimdia content analysis

20

Win-win-win relationships Service providers

[Service providers] • Probe individual preferences & “hidden voices” of users

Social curation as corpora for large-scale multimdia content analysis

21

Social curation as a second media • Social curation = social media as sensors + focused storytelling Social media

Social curation

Mass media

Social curation as corpora for large-scale multimdia content analysis

22

Automatic social curation

(Cf. http://paper.li/g9ine/1307853228)

Social curation as corpora for large-scale multimdia content analysis

23

Automatic social curation (in Japan)

(Cf. http://gunosy.com)

Social curation as corpora for large-scale multimdia content analysis

24

Uniqueness of Japanese culture • Japanese social media users are used to be familiar to social curation tools as “まとめサイト”.

• But we haven’t noticed they are social curation services, we regarded them as a kind of SNSs. Social curation as corpora for large-scale multimdia content analysis

25

2. Why is “Social Curation” emerging?

Social media as corpora • Feasible to exploit them as sensors for real-world events and internet memes Sadilek+ “Modeling Spread of Disease from Social Interactions,” Proc. ICWSM12 (Best Paper Award candidates)

(Cf. http://www.youtube.com/watch?v=3S2rq2SKTSw)

Social curation as corpora for large-scale multimdia content analysis

26

2. Why is “Social Curation” emerging?

Social curation as corpora 1. Curated contents are manually selected. – Only “credible” contents survive in the process.

Social curation as corpora for large-scale multimdia content analysis

27

2. Why is “Social Curation” emerging?

Social curation as corpora 2. Curated contents share the same context. – Effectively present curator’s interest or perspective.

(Cf. http://storify.com/cbccommunity/how-has-texting-changed-your-life)

Social curation as corpora for large-scale multimdia content analysis

28

How does “Social Curation” work?

Since 2013-02-20 / Copyright 2013 NTT Communication Science Laboratories. All Rights Reserved.

3. How does “Social Curation” work?

Content analysis with social curation • Despite the excitement of social curation, few researches have been performed so far. – Most previous work dealing with social curation comes from sociology, not computer science. – Most previous work from CS communities handled social media contents, not socially curated contents.

• Social curation has so much room to grow as a promising research topic!! Social curation as corpora for large-scale multimdia content analysis

30

3 aspects for handling social curation Data resources

Knowledge & findings

Methods

Difference resources yield The most significant aspect Any sophisticated methods difference analysis results. among all the aspects. aren’t required, since data is clean and well organized. Social curation as corpora for large-scale multimdia content analysis

31

2 policies for social curation analysis • Wisdom or diversity of crowds Vegetables Green

Fruits

Strange

Funny Face

Social curation as corpora for large-scale multimdia content analysis

32

Our contributions Data resources

Policies

Wisdom of crowds

Togetter

Pinterest

Finding topic-wise power users

Image context discovery [Kimura+ to appear]

[Takeuchi+ IBIS12]

Estimating image interestingness [Ishiguro+ ICDM12]

Diversity, Assisting social curation User preference analysis [Kataoka+ PRMU13] personalization [Duh+ ICWSM12]

Social curation as corpora for large-scale multimdia content analysis

33

3. How does “Social Curation” work?

Assisting social curation

Social curation as corpora for large-scale multimdia content analysis

34

Demo video

Social curation as corpora for large-scale multimdia content analysis

35

Contributions • Data – Togetter

• Finding – Curated contents are supervised corpora. – This can be acquired through large corpus analysis.

• Method – Learning to rank : popular IR technique

Social curation as corpora for large-scale multimdia content analysis

36

3. How does “Social Curation” work?

Corpus analysis of Togetter Median 6 users

90% 60 users

# curated contents in a list sufficiently long for analysis

Median

90% 250 tweets

40 tweets

# users involved in a list Involves several users, only social curation could capture it.

Social curation as corpora for large-scale multimdia content analysis

37

3. How does “Social Curation” work?

Corpus analysis of Togetter Curators gathers lots of tweets by other people

Occupied by curators’ tweets

Social curation as corpora for large-scale multimdia content analysis

38

3. How does “Social Curation” work?

Findings from the stats • Lists are very diverse – in terms of size, topic, and purpose.

• Lists are rather large and elaborate – by curators’ manual efforts.

• Lists are probably suitable as corpora – for analyzing microblog messages. Social curation as corpora for large-scale multimdia content analysis

39

3. How does “Social Curation” work?

Assisting social curation • Content discovery is time-consuming • Let curators focus on filtering & presentation Twitter Timelines

Partially Curated List (queries)

tweet

tweet

tweet

tweet

tweet

retrieve tweet

more? aggregate manual inspection

suggest new contents

Ranking

Social curation as corpora for large-scale multimdia content analysis

40

3. How does “Social Curation” work?

Finding topic-wise power users

Social curation as corpora for large-scale multimdia content analysis

41

Demo (snapshot)

Social curation as corpora for large-scale multimdia content analysis

42

Contributions • Data – Togetter

• Findings – Lists are diverse, but each list has a specific topic. – Only informative tweets would be gathered into a list. – Topically credible users frequently appear in lists.

• Method – Novel extension of non-negative matrix factorization Social curation as corpora for large-scale multimdia content analysis

43

Method in detail • Formulated as a factorization of multiple matrices correlated to each other

X

160K

words

0.0017% non-zero

Y 1.23% non-zero

Z Factorization

lists users

1.82M

users

lists

2 popularity

W

words

230K

A

Social curation as corpora for large-scale multimdia content analysis

H

B

Power users and lists about a topic “space” Power users and lists about a topic “Hadoop”

44

Qualitative analysis Topic 1 “Space” Topical words 宇宙(space) 衛星(satellite) 地球(earth) ロケット(rocket) 軌道(orbit) JAXA はやぶ さ(Hayabusa) 探査(exploration) 打ち上げ(launch) 飛行(flight) 移籍(transfer) あかつ き(Akatsuki)

Authority

Profile

hadukino

Space and astronomy n China

1,321

ohnuki_tsuyoshi

Space developer

5,477

5thstar

Space pilot finalist

2,386

ShinyaMatsuura

Writer on space development

sctracker

Spacecraft tracker

Social curation as corpora for large-scale multimdia content analysis

#follow

18,609 847

45

Qualitative analysis Topic 2 “Hadoop” Topical words 使う(use) Hadoop クラウド(cloud) さん(Mr.) AWS ない(not) 話(story) データ(data) これ (this) いい(good) 開発(development) セッション(session) PHP 処理(processing) DB サーバ(server)

Authority

Profile

understeer

Solutions Architect

yutuki_r

Cloud, Hadoop, NoSQL

kimtea

NoSQL, Google AppEngine

2,742

repeatedly

D, Ruby, Python, Opera

4,081

siena

Web techniques and design

1,065

Social curation as corpora for large-scale multimdia content analysis

#follow 2,368 800

46

Estimating interestingness of images

Social curation as corpora for large-scale multimdia content analysis

47

Contributions • Data – Togetter

• Domain knowledge & findings – A lot of lists contained an image content. – View counts of images can be regarded as a quantitative and measurable proxy of interestingness.

• Methods – Dispense with expensive CV/PR techniques Social curation as corpora for large-scale multimdia content analysis

48

3. How does “Social Curation” work?

Formulation • View count prediction by curation information

Social curation as corpora for large-scale multimdia content analysis

49

3. How does “Social Curation” work?

Results • Easily available social curation features performed greatly.

Social curation as corpora for large-scale multimdia content analysis

50

3. How does “Social Curation” work?

Individual preference analysis

Social curation as corpora for large-scale multimdia content analysis

51

Contributions • Data – Pinterest

• Domain knowledge & findings – Every curated image collection (board) reflects individual preference of a user.

• Methods – Saliency-based object detection – Image clustering based on a topic model (LDA) Social curation as corpora for large-scale multimdia content analysis

52

3. How does “Social Curation” work?

Pinterest vs. Flickr

Service style

(1) Content sharing (2) Social network

(1) Social curation (2) Social network

Network structure

User-centric network

Content-centric network

Resources

Uploading original contents is a major style.

Most contents come from outside, including Flickr.

Popularity

Still a mainstream, but in a falling trend.

Emerging, becoming a mainstream.

Research progress

Frequently used in CS researches.

Few researches have been performed.

Social curation as corpora for large-scale multimdia content analysis

53

3. How does “Social Curation” work?

Framework • Corpora for personalized image retrieval

Social curation as corpora for large-scale multimdia content analysis

54

3. How does “Social Curation” work?

Qualitative results • Confirm many boards reflect user preferences Images in a board (probably reflecting user preferences)

Proposed method (MAP estimation) Feature-based NN Random sampling Social curation as corpora for large-scale multimdia content analysis

55

Application examples • Query = “Christmas” Storage (board) of User 1

Storage (board) of User 2

Retrieval results for User 1

Retrieval results for User 2

Social curation as corpora for large-scale multimdia content analysis

56

What is the next?

Since 2013-02-20 / Copyright 2013 NTT Communication Science Laboratories. All Rights Reserved.

4. What is the next?

Future of social curation services • We are now at the dawn of social curation – But, it’s a “social curation” bubble, coming from over-expectation for a new trend. – A selection process will start soon.

• The key would be inter-service partnerships. – Remember, why service providers do it : Probe user preferences & “hidden voices” – Fundamental limitation of a single service Social curation as corpora for large-scale multimdia content analysis

58

4. What is the next?

Future of social curation services • Movements of web giants cannot be ignored – Information stored in social curation services would be really demanding for portal sites, SNSs and ECs.

• Will become a segment-market 1. Marketing : Collaborating with ads and E-commerce. 2. Portal : Entertaining users

Social curation as corpora for large-scale multimdia content analysis

59

4. What is the next?

Future of social curation as corpora • At the predawn in CS communities – Finding a promising corpus yields 3 papers :-) (Analysis, method, application)

• There is a high bar to approach – Requires various knowledge and techniques in a wide range of research areas • NLP, IP, SP, IR, ML, network analysis, human activity etc. Social curation as corpora for large-scale multimdia content analysis

60

4. What is the next?

Future of social curation as corpora • The fastest way to learn domain knowledge is to be a user. • Multi-platform analysis would be promising. – Fundamental limitation of a single platform. – The point is the way of identifying users.

• Which do you want, wisdom or diversity? – Wisdom : An alternative to crowdsourcing – Diversity : Analyzing contents from multiple views Social curation as corpora for large-scale multimdia content analysis

61

Summary • Social curation has a great potential for analyzing large-scale multimedia contents.

1. Curated contents share a specific context, unlike common social media contents.

2. Quite a large amount of collection is freely available. Social curation as corpora for large-scale multimdia content analysis

62

• The presentation slides will be uploaded soon. http://bit.ly/prmu2013FebCuration

Akisato Kimura, Ph.D NTT Communication Science Laboratories [email protected] @_akisato Social curation as corpora for large-scale multimdia content analysis

63

Social curation as corpora for large-scale multimdia content analysis

64

Appendix

Since 2013-02-20 / Copyright 2013 NTT Communication Science Laboratories. All Rights Reserved.

参考文献 • Duh+ “Creating stories: Social curation of Twitter messages,” Proc. ICWSM2012. • Takeuchi+ “Stacked non-negative matrix factorization for social media analysis,” Proc. IBIS2012 (In Jap). • Ishiguro+ “Towards automatic image understanding and mining via social curation,” Proc. ICDM2012. • Kataoka+ (Yesterday’s talk, In Jap). Social curation as corpora for large-scale multimdia content analysis

66