Velocity Culture - O'Reilly Media

12 downloads 198 Views 2MB Size Report
Jun 16, 2011 - on linking it to the business. Web performance has succeeded ... Bing, Google. 2010. Shopzilla. 2011. MSN
Velocity  Culture (The  Unmet  Challenge  in  Ops) Jon  Jenkins Amazon.com [email protected] K͛ZĞŝůůLJsĞůŽĐŝƚLJŽŶĨĞƌĞŶĐĞʹ June  16,  2011

The  success  of  Velocity  culture  depends   on  linking  it  to  the  business Web  performance  has  succeeded  in doing  this,  ops  less  so

Ops  needs  to  focus  on  closing  this  gap

2009 Bing,  Google 2010 Shopzilla

2011 MSN,  DoubleClick

ops  =  business ops  !=  business ops  ?  business

ops          business

What  if  the  size  of  your  server fleet  was  completely  flexible?

Case  Study  1  ʹ Scaling  Down

Typical  Weekly  Traffic  to  amazon.com

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Typical  Weekly  Traffic  to  amazon.com

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Typical  Weekly  Traffic  to  amazon.com

39%

61% Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

November  Traffic  for  amazon.com

November  Traffic  for  amazon.com

76%

24%

November  Traffic  for  amazon.com

Capacity Planning = Spending Money

Capacity Optimization = Saving Money

The  Problem ‡ Web  site  hardware  is  underutilized ‡ Traffic  spikes  require  heroic  effort ‡ Scaling  is  non-­‐linear

November  10,  2010

Outcomes ‡ All  traffic  for  www.amazon.com  is  now   served  by  EC2 ‡ Reduced  spending  on  server  capacity ‡ Fleet  scales  dynamically  in  increments   as  small  as  a  single  host ‡ Traffic  spikes  can  be  handled  with  ease ‡ Cultural  change

Case  Study  2  ʹ Scaling  Up

Continuous  Deployment

Amazon  May  Deployment  Stats (production  hosts  &  environments  only) 11.6  seconds Mean  time  between  deployments  (weekday)

1,079 Max  #  of  deployments  in  a  single  hour 10,000 Mean  #  of  hosts  simultaneously  receiving  a  deployment 30,000 Max  #  of  hosts  simultaneously  receiving  a  deployment

Availability  Zone  1

WWW1 WWW2

WWW3 WWWn

Availability  Zone  2

Load  Balancer

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

Availability  Zone  1

WWW1 WWW2

WWW3 WWWn

Availability  Zone  2

Load  Balancer

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

Availability  Zone  1

WWW1 WWW2

WWW3 WWWn

Availability  Zone  2

Load  Balancer

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

Availability  Zone  1

WWW1 WWW2

WWW3 WWWn

Availability  Zone  2

Load  Balancer

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

Availability  Zone  1

WWW1 WWW2

WWW3 WWWn

Availability  Zone  2

Load  Balancer

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

The  Problem ‡ Upgrading  software  on  a  fixed  fleet   requires  a  complex  workflow ‡ Upgrading  software  on  a  fixed  fleet  is  a   slow  process ‡ Dealing  with  failure  scenarios  requires   emergent,  high-­‐judgment  decisions

Availability  Zone  1

WWW1 WWW2

WWW3 WWWn

Availability  Zone  2

Load  Balancer

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

Availability  Zone  1

WWW1 WWW2

Availability  Zone  1

WWW3 WWWn

WWW1 WWW2

Availability  Zone  2

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

WWW3 WWWn

Availability  Zone  2

Load  Balancer

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

Availability  Zone  1

WWW1 WWW2

Availability  Zone  1

WWW3 WWWn

WWW1 WWW2

Availability  Zone  2

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

WWW3 WWWn

Availability  Zone  2

Load  Balancer

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

Availability  Zone  1

WWW1 WWW2

Availability  Zone  1

WWW3 WWWn

WWW1 WWW2

Availability  Zone  2

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

WWW3 WWWn

Availability  Zone  2

Load  Balancer

WWW1 WWW2

WWW3 WWWn

Availability  Zone  3

WWW1 WWW2

WWW3 WWWn

Results ‡ 75%  reduction  in  outages  triggered  by   software  deployments  since  2006 ‡ 90%  reduction  in  outage  minutes   triggered  by  software  deployments ‡ ~0.001%  of  software  deployments   cause  an  outage ‡ Instantaneous  automated  rollback ‡ Reduction  in  complexity

The  Challenge  for  Velocity 2012

Long  live  Ops!