One feature arGcle chosen each day by Wikipedia community through a peer review process (2,563). ⢠They are reviewed f
4322 North Quad, 105 S. State St. Ann Arbor, MI 48109-1285
Coordination and Efficiency in Decentralized Collaboration Daniel M. Romero School of Informa3on University of Michigan In collabora3on with Dan Hu>enlocher and Jon Kleinberg
1
Coordina3on in Decentralized Collabora3on Environments Explicit coordina.on: Requires interac3on. Costly & 3me consuming.
2
Coordina3on in Decentralized Collabora3on Environments Explicit coordina.on: Requires interac3on. Costly & 3me consuming. Implicit coordina.on: Li>le interac3on. Requires a mutual mental model. [Cannon-‐Bower 90] Trade-‐off: Coordina3on benefits vs. cost
3
Coordina3on in Decentralized Collabora3on Environments Explicit coordina.on: Requires interac3on. Costly & 3me consuming. Implicit coordina.on: Li>le interac3on. Requires a mutual mental model. [Cannon-‐Bower 90] Trade-‐off: Coordina3on benefits vs. cost In this paper: How coordina3on levels vary depending on: 1) quality of output 2) team composi3on 4
Coordina3on in Wikipedia Data: All of Wikipedia edit history up to April 1, 2007. A total of 3.4 million ar3cles edited by 500K users. Coordina.on Tools: 1. Discussion Pages: Discussion of any issues with the ar3cle. 2. Edit comments: Comments that explain the nature of each edit.
5
Discussion and Comment Topics Discussion Pages Jus3fy edit Suggest edit Provide reference Ques3on Copyright Issue Dispute claim in ar3cle Future direc3on Other
18 33 20 13 8 12 8 6
Edit Comments Men3ons sec3on Reverted edit Minor edit Added content Removed content Correc3on Men3ons other users Other
52 14 19 14 7 2 14 11
Featured Wikipedia Ar3cles • One feature ar3cle chosen each day by Wikipedia community through a peer review process (2,563). • They are reviewed for accuracy, neutrality, completeness, and style.
7
Featured Wikipedia Ar3cles • One feature ar3cle chosen each day by Wikipedia community through a peer review process (2,563). • They are reviewed for accuracy, neutrality, completeness, and style. Do editors of featured ar3cles coordinate more or less than editors of non-‐featured ar3cles? 8
Coordina3on and Quality x-‐core: Smallest set of editors responsible for x% of all edits.
9
Coordina3on and Quality
20 15
Red: featured Blue: non-‐featured
10 5 0
0.2
0.4
x x
0.6
0.8
1.0
Num. comments by byx-‐core Number of comments x-core
Num. dits by by x-core x-‐core Num.discussion discussion eedits
x-‐core: Smallest set of editors responsible for x% of all edits. 160 140 120
Red: featured Blue: non-‐featured
100 80 60 40 20 0
0.2
0.4
x x
0.6
0.8
Higher levels of coordina3on associated with higher quality
1.0
10
Coordina3on and Team Composi3on
11
Coordina3on and Team Composi3on Num. Editors
Amount of work
12
Coordina3on and Team Composi3on
Crowded Environment
13
Coordina3on and Team Composi3on Num. Editors
Amount of work
14
Coordina3on and Team Composi3on
Less Crowded Environment
15
Coordina3on and Team Composi3on Consider the first 100 edits to each ar3cle Num. editors Eventual size of ar3cle in bytes (or word count) Num. discussion edits by the 100th ar3cle edit
16
Coordina3on and Team Composi3on
Ar5cle size (bytes)
Consider the first 100 edits to each ar3cle Num. editors Eventual size of ar3cle in bytes (or word count) Num. discussion edits by the 100th ar3cle edit
Num. Editors
Higher levels of coordina3on associated with crowdedness
17
Coordina3on and Team Composi3on
Ar5cle size (bytes)
Ar5cle size (bytes)
Final size of article (In KB)
Consider the first 100 edits to each ar3cle Num. editors Eventual size of ar3cle in bytes (or word count) Num. discussion edits by the 100th ar3cle edit 80 60
Da =32 Da/Sa =2.5 N =3983
40
Da E=8 D1a00 =37 Num. ditors in first edits
20 0
Num. Editors
Da =7 Da/Sa =2.0 N =2511
Da/Sa =2.5 N =3624
Da/Sa =3.08 N =2874
2 4 6 8 10 12 Number ofNum. editors E ofditors initial 100 Edits
Higher levels of coordina3on associated with crowdedness
18
14
Coordina3on and Team Composi3on
Ar5cle size (bytes)
Ar5cle size (bytes)
Final size of article (In KB)
Consider the first 100 edits to each ar3cle Num. editors Eventual size of ar3cle in bytes (or word count) Num. discussion edits by the 100th ar3cle edit 80 60
Da =32 Da/Sa =2.5 N =3983
40
Da E=8 D1a00 =37 Num. ditors in first edits
20 0
Num. Editors
Da =7 Da/Sa =2.0 N =2511
Da/Sa =2.5 N =3624
Da/Sa =3.08 N =2874
2 4 6 8 10 12 Number ofNum. editors E ofditors initial 100 Edits
Higher levels of coordina3on associated with crowdedness
19
14
Coordina3on Trade-‐Off Model • Each ar3cles has N “parts” and E editors. • Each part is either “empty” or “full” • The goal is to fill in as many parts as possible
20
Coordina3on Trade-‐Off Model • Each ar3cles has N “parts” and E editors. • Each part is either “empty” or “full” • The goal is to fill in as many parts as possible When editors arrive: • If they choose an empty part, the part will become full (net gain to ar3cle)
21
Coordina3on Trade-‐Off Model • Each ar3cles has N “parts” and E editors. • Each part is either “empty” or “full” • The goal is to fill in as many parts as possible When editors arrive: • If they choose an empty part, the part will become full (net gain to ar3cle)
22
Coordina3on Trade-‐Off Model • Each ar3cles has N “parts” and E editors. • Each part is either “empty” or “full” • The goal is to fill in as many parts as possible When editors arrive: • If they choose an empty part, the part will become full (net gain to ar3cle) • If they choose a full part, the part will become empty (net loss to the ar3cle) with probability α.
23
Coordina3on Trade-‐Off Model • Each ar3cles has N “parts” and E editors. • Each part is either “empty” or “full” • The goal is to fill in as many parts as possible When editors arrive: • If they choose an empty part, the part will become full (net gain to ar3cle) • If they choose a full part, the part will become empty (net loss to the ar3cle) with probability α.
24
Coordina3on Trade-‐Off Model Each editor has 2 op3ons: – Not coordinate: Choose 2 random parts to edit: contribute -‐2, to 2 parts to the ar3cle
25
Coordina3on Trade-‐Off Model Each editor has 2 op3ons: – Not coordinate: Choose 2 random parts to edit: contribute -‐2, to 2 parts to the ar3cle – Coordinate: Choose 1 empty part to edit: Contribute exactly 1 part to the ar3cle.
26
Coordina3on Trade-‐Off Model Each editor has 2 op3ons: – Not coordinate: Choose 2 random parts to edit: contribute -‐2, to 2 parts to the ar3cle – Coordinate: Choose 1 empty part to edit: Contribute exactly 1 part to the ar3cle. Each editor coordinates with probability β (fixed)
Find β that maximizes the number of finished parts (in terms of N and E) 27
Op3mal Coordina3on • Fix N and β and let P be the expected number of parts filled i amer the first i editors.
Pi+1 = APi + P0 4(1− β ) 2(1− β ) A= − +1 2 N N 1− β P0 = − −β +2 N
Ai − 1 Pi = P0 A −1
N
Op3mal Coordina3on
E
• When E > N, β = 1 (fill all parts by coordina3ng) • When E is small enough, β = 0 (colliding is unlikely) • In between, the best β is lies away from both 0 and 1. 29
Comparison with Wikipedia Ar5cle size (bytes)
N
E
Num. Editors
Higher levels of coordina3on in crowded ar3cles
30
Discussion • Coordina.on trade-‐off: Projects with high performance and in crowded environments exhibit higher levels of coordina3on.
31
Discussion • Coordina.on trade-‐off: Projects with high performance and in crowded environments exhibit higher levels of coordina3on. • Implica.on for design: Coordina3on mechanisms should be emphasized more strongly on crowded projects.
32
Discussion • Coordina.on trade-‐off: Projects with high performance and in crowded environments exhibit higher levels of coordina3on. • Implica.on for design: Coordina3on mechanisms should be emphasized more strongly on crowded projects. • Generalizability: o Findings hold in two very different domains: Wikipedia and GitHub (See paper). o Proposed framework (model and measures) can be directly adapted to new serngs, where a group produces a primary work product and a separate channel for coordina3on. 33
Discussion • Coordina.on trade-‐off: Projects with high performance and in crowded environments exhibit higher levels of coordina3on. • Implica.on for design: Coordina3on mechanisms should be emphasized more strongly on crowded projects. • Generalizability: o Findings hold in two very different domains: Wikipedia and GitHub (See paper). o Proposed framework (model and measures) can be directly adapted to new serngs, where a group produces a primary work product and a separate channel for coordina3on. 34
Coordina3on in GitHub • Coordina3on: Number of comments • Size of project: Number of commits • Status: Number of watchers
0.22 Num. of comments per commit
Num. of comments per commit
0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0
5
10 15 Num. of watchers
20
25
0.20 0.18 0.16 0.14 0.12 0.10 0
10
20 30 40 50 Num. of commits
60
70
35
N
Number of Commits Final size of project
Coordina3on in GitHub
E
90
4.0
80
3.6
70
3.2
60
2.8
50
2.4
40
2.0
30
1.6
20 10 0
1.2 0.8
10 20 30 40 50 60 70 80 90 Num. contributors of initial 100 commits
Contributors ini3al 100 commits
High coordina3on increases when ar3cle becomes crowded -‐-‐ small size and many par3cipants 36