Coordination and Efficiency in Decentralized ... - Daniel Romero

1 downloads 149 Views 3MB Size Report
One feature arGcle chosen each day by Wikipedia community through a peer review process (2,563). • They are reviewed f
4322 North Quad, 105 S. State St. Ann Arbor, MI 48109-1285

Coordination and Efficiency in Decentralized Collaboration Daniel  M.  Romero   School  of  Informa3on   University  of  Michigan   In  collabora3on  with  Dan  Hu>enlocher  and  Jon  Kleinberg  

1

Coordina3on  in  Decentralized   Collabora3on  Environments   Explicit  coordina.on:  Requires   interac3on.  Costly  &  3me  consuming.            

2  

Coordina3on  in  Decentralized   Collabora3on  Environments   Explicit  coordina.on:  Requires   interac3on.  Costly  &  3me  consuming.       Implicit  coordina.on:    Li>le   interac3on.  Requires  a  mutual   mental  model.  [Cannon-­‐Bower  90]      Trade-­‐off:  Coordina3on  benefits  vs.  cost  

3  

Coordina3on  in  Decentralized   Collabora3on  Environments   Explicit  coordina.on:  Requires   interac3on.  Costly  &  3me  consuming.       Implicit  coordina.on:    Li>le   interac3on.  Requires  a  mutual   mental  model.  [Cannon-­‐Bower  90]      Trade-­‐off:  Coordina3on  benefits  vs.  cost   In  this  paper:  How  coordina3on  levels  vary  depending  on:    1)  quality  of  output      2)  team  composi3on   4  

Coordina3on  in  Wikipedia   Data:  All  of  Wikipedia  edit  history  up  to   April  1,  2007.  A  total  of  3.4  million   ar3cles  edited  by  500K  users.   Coordina.on  Tools:     1.  Discussion  Pages:  Discussion  of  any  issues  with  the  ar3cle.     2.  Edit  comments:  Comments  that  explain  the  nature  of  each  edit.        

5  

Discussion  and  Comment  Topics   Discussion  Pages   Jus3fy  edit   Suggest  edit   Provide  reference   Ques3on   Copyright  Issue   Dispute  claim  in  ar3cle   Future  direc3on   Other  

18   33   20   13   8   12   8   6  

Edit  Comments   Men3ons  sec3on   Reverted  edit   Minor  edit   Added  content   Removed  content   Correc3on   Men3ons  other   users   Other  

52   14   19   14   7   2   14   11  

Featured  Wikipedia  Ar3cles     •   One  feature  ar3cle  chosen  each  day   by  Wikipedia  community  through  a   peer  review  process  (2,563).     •   They  are  reviewed  for  accuracy,   neutrality,  completeness,  and  style.    

7  

Featured  Wikipedia  Ar3cles     •   One  feature  ar3cle  chosen  each  day   by  Wikipedia  community  through  a   peer  review  process  (2,563).     •   They  are  reviewed  for  accuracy,   neutrality,  completeness,  and  style.     Do  editors  of  featured  ar3cles  coordinate  more  or  less  than   editors  of  non-­‐featured  ar3cles?   8  

Coordina3on  and  Quality   x-­‐core:  Smallest  set  of  editors  responsible  for  x%  of  all  edits.  

9  

Coordina3on  and  Quality  

20 15

Red:  featured     Blue:  non-­‐featured  

10 5 0

0.2

0.4

x   x

0.6

0.8

1.0

Num.  comments   by  byx-­‐core   Number of comments x-core

Num.   dits  by by  x-core x-­‐core   Num.discussion   discussion eedits

x-­‐core:  Smallest  set  of  editors  responsible  for  x%  of  all  edits.   160 140 120

Red:  featured     Blue:  non-­‐featured  

100 80 60 40 20 0

0.2

0.4

x x  

0.6

0.8

Higher  levels  of  coordina3on  associated  with  higher  quality  

1.0

10  

Coordina3on  and  Team  Composi3on  

11  

Coordina3on  and  Team  Composi3on   Num.  Editors  

Amount  of  work  

12  

Coordina3on  and  Team  Composi3on  

Crowded  Environment  

13  

Coordina3on  and  Team  Composi3on   Num.  Editors  

Amount  of  work  

14  

Coordina3on  and  Team  Composi3on  

Less  Crowded  Environment  

15  

Coordina3on  and  Team  Composi3on   Consider  the  first  100  edits  to  each  ar3cle   Num.  editors     Eventual  size  of  ar3cle  in  bytes  (or  word  count)   Num.  discussion  edits  by  the  100th  ar3cle  edit  

16  

Coordina3on  and  Team  Composi3on  

Ar5cle  size  (bytes)    

Consider  the  first  100  edits  to  each  ar3cle   Num.  editors     Eventual  size  of  ar3cle  in  bytes  (or  word  count)   Num.  discussion  edits  by  the  100th  ar3cle  edit  

Num.  Editors  

Higher  levels  of  coordina3on  associated  with  crowdedness  

17  

Coordina3on  and  Team  Composi3on  

Ar5cle  size  (bytes)    

Ar5cle  size  (bytes)    

Final size of article (In KB)

Consider  the  first  100  edits  to  each  ar3cle   Num.  editors     Eventual  size  of  ar3cle  in  bytes  (or  word  count)   Num.  discussion  edits  by  the  100th  ar3cle  edit   80 60

Da =32 Da/Sa =2.5 N =3983

40

Da E=8 D1a00   =37 Num.   ditors  in  first   edits  

20 0

Num.  Editors  

Da =7 Da/Sa =2.0 N =2511

Da/Sa =2.5 N =3624

Da/Sa =3.08 N =2874

2 4 6 8 10 12 Number ofNum.   editors E ofditors   initial 100 Edits

Higher  levels  of  coordina3on  associated  with  crowdedness  

18  

14

Coordina3on  and  Team  Composi3on  

Ar5cle  size  (bytes)    

Ar5cle  size  (bytes)    

Final size of article (In KB)

Consider  the  first  100  edits  to  each  ar3cle   Num.  editors     Eventual  size  of  ar3cle  in  bytes  (or  word  count)   Num.  discussion  edits  by  the  100th  ar3cle  edit   80 60

Da =32 Da/Sa =2.5 N =3983

40

Da E=8 D1a00   =37 Num.   ditors  in  first   edits  

20 0

Num.  Editors  

Da =7 Da/Sa =2.0 N =2511

Da/Sa =2.5 N =3624

Da/Sa =3.08 N =2874

2 4 6 8 10 12 Number ofNum.   editors E ofditors   initial 100 Edits

Higher  levels  of  coordina3on  associated  with  crowdedness  

19  

14

Coordina3on  Trade-­‐Off  Model     •  Each  ar3cles  has  N  “parts”  and  E  editors.     •  Each  part  is  either  “empty”  or  “full”   •  The  goal  is  to  fill  in  as  many  parts  as  possible    

20  

Coordina3on  Trade-­‐Off  Model     •  Each  ar3cles  has  N  “parts”  and  E  editors.     •  Each  part  is  either  “empty”  or  “full”   •  The  goal  is  to  fill  in  as  many  parts  as  possible   When  editors  arrive:   •  If  they  choose  an  empty  part,  the  part  will  become  full  (net   gain  to  ar3cle)    

21  

Coordina3on  Trade-­‐Off  Model     •  Each  ar3cles  has  N  “parts”  and  E  editors.     •  Each  part  is  either  “empty”  or  “full”   •  The  goal  is  to  fill  in  as  many  parts  as  possible   When  editors  arrive:   •  If  they  choose  an  empty  part,  the  part  will  become  full  (net   gain  to  ar3cle)    

22  

Coordina3on  Trade-­‐Off  Model     •  Each  ar3cles  has  N  “parts”  and  E  editors.     •  Each  part  is  either  “empty”  or  “full”   •  The  goal  is  to  fill  in  as  many  parts  as  possible     When  editors  arrive:   •  If  they  choose  an  empty  part,  the  part  will  become  full  (net   gain  to  ar3cle)   •  If  they  choose  a  full  part,  the  part  will  become  empty  (net  loss   to  the  ar3cle)  with  probability  α.    

23  

Coordina3on  Trade-­‐Off  Model     •  Each  ar3cles  has  N  “parts”  and  E  editors.     •  Each  part  is  either  “empty”  or  “full”   •  The  goal  is  to  fill  in  as  many  parts  as  possible     When  editors  arrive:   •  If  they  choose  an  empty  part,  the  part  will  become  full  (net   gain  to  ar3cle)   •  If  they  choose  a  full  part,  the  part  will  become  empty  (net  loss   to  the  ar3cle)  with  probability  α.    

24  

Coordina3on  Trade-­‐Off  Model     Each  editor  has  2  op3ons:   –  Not  coordinate:  Choose  2  random  parts  to  edit:  contribute   -­‐2,  to  2  parts  to  the  ar3cle        

25  

Coordina3on  Trade-­‐Off  Model     Each  editor  has  2  op3ons:   –  Not  coordinate:  Choose  2  random  parts  to  edit:  contribute   -­‐2,  to  2  parts  to  the  ar3cle   –  Coordinate:  Choose  1  empty  part  to  edit:  Contribute   exactly  1  part  to  the  ar3cle.        

26  

Coordina3on  Trade-­‐Off  Model     Each  editor  has  2  op3ons:   –  Not  coordinate:  Choose  2  random  parts  to  edit:  contribute   -­‐2,  to  2  parts  to  the  ar3cle   –  Coordinate:  Choose  1  empty  part  to  edit:  Contribute   exactly  1  part  to  the  ar3cle.      Each  editor  coordinates  with  probability  β  (fixed)      

Find  β  that  maximizes  the  number  of  finished  parts  (in  terms  of   N  and  E)   27  

Op3mal  Coordina3on   •   Fix  N  and  β  and  let    P        be  the  expected  number  of  parts  filled   i amer  the  first  i  editors.        

Pi+1 = APi + P0 4(1− β ) 2(1− β ) A= − +1 2 N N 1− β P0 = − −β +2 N

Ai − 1 Pi = P0 A −1

N  

Op3mal  Coordina3on  

E  

•  When  E  >  N,  β  =  1    (fill  all  parts  by  coordina3ng)   •  When  E  is  small  enough,  β  =  0  (colliding  is  unlikely)   •  In  between,  the  best  β  is  lies  away  from  both  0  and  1.     29  

Comparison  with  Wikipedia   Ar5cle  size  (bytes)    

   

N  

   

E  

Num.  Editors  

Higher  levels  of  coordina3on  in  crowded  ar3cles    

30  

Discussion   •  Coordina.on  trade-­‐off:  Projects  with  high  performance  and  in   crowded  environments  exhibit  higher  levels  of  coordina3on.    

31  

Discussion   •  Coordina.on  trade-­‐off:  Projects  with  high  performance  and  in   crowded  environments  exhibit  higher  levels  of  coordina3on.   •  Implica.on  for  design:  Coordina3on  mechanisms  should  be   emphasized  more  strongly  on  crowded  projects.    

32  

Discussion   •  Coordina.on  trade-­‐off:  Projects  with  high  performance  and  in   crowded  environments  exhibit  higher  levels  of  coordina3on.   •  Implica.on  for  design:  Coordina3on  mechanisms  should  be   emphasized  more  strongly  on  crowded  projects.   •  Generalizability:     o  Findings  hold  in  two  very  different  domains:  Wikipedia  and   GitHub  (See  paper).     o  Proposed  framework  (model  and  measures)  can  be  directly   adapted  to  new  serngs,  where  a  group  produces  a   primary  work  product  and  a  separate  channel  for   coordina3on.     33  

Discussion   •  Coordina.on  trade-­‐off:  Projects  with  high  performance  and  in   crowded  environments  exhibit  higher  levels  of  coordina3on.   •  Implica.on  for  design:  Coordina3on  mechanisms  should  be   emphasized  more  strongly  on  crowded  projects.   •  Generalizability:     o  Findings  hold  in  two  very  different  domains:  Wikipedia  and   GitHub  (See  paper).     o  Proposed  framework  (model  and  measures)  can  be  directly   adapted  to  new  serngs,  where  a  group  produces  a   primary  work  product  and  a  separate  channel  for   coordina3on.     34  

Coordina3on  in  GitHub   •  Coordina3on:  Number  of  comments   •  Size  of  project:  Number  of  commits     •  Status:  Number  of  watchers    

   

0.22 Num. of comments per commit

Num. of comments per commit

0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0

5

10 15 Num. of watchers

20

25

0.20 0.18 0.16 0.14 0.12 0.10 0

10

20 30 40 50 Num. of commits

60

70

35  

N  

Number   of  Commits     Final size of project

Coordina3on  in  GitHub  

E  

90

4.0

80

3.6

70

3.2

60

2.8

50

2.4

40

2.0

30

1.6

20 10 0

1.2 0.8

10 20 30 40 50 60 70 80 90 Num. contributors of initial 100 commits

Contributors  ini3al  100  commits  

High  coordina3on  increases  when  ar3cle  becomes  crowded  -­‐-­‐   small  size  and  many  par3cipants   36