Power-efficient Clusters - Semantic Scholar

3 downloads 207 Views 2MB Size Report
May 20, 2009 - FAWNdamentally. Power-efficient Clusters. Vijay Vasudevan, Jason Franklin, David Andersen, .... is FAWN g
FAWNdamentally

Power-efficient Clusters Vijay Vasudevan, Jason Franklin, David Andersen, Amar Phanishayee, Lawrence Tan, Michael Kaminsky*, Iulian Moraru Carnegie Mellon University, *Intel Research Pittsburgh

May 20, 2009

1

Monthly energy statement considered harmful

• Power is a limiting factor in computing • 3-year TCO soon to be dominated by power cost [EPA 2007]

• Influences location, technology choices

2

Approaches to saving power Infrastructure Efficiency Dynamic Power Scaling Computational Efficiency

Power generation Power distribution Cooling Sleeping when idle Rate adaptation VM consolidation

FAWN

3

Approaches to saving power Infrastructure Efficiency Dynamic Power Scaling Computational Efficiency

Power generation Power distribution Cooling Sleeping when idle Rate adaptation VM consolidation

FAWN

Goal of computational efficiency: Reduce the amount of energy to do useful work 3

Tan, Vijay Vasudevan wer Project

FAWN

Fast Array of Wimpy Nodes

34-5"6"78-, 9&4:&4

Improve computational efficiency of data-intensive computing using an array of well-balanced low-power systems.

+;1< ()* %&' +,-#. ()* %&' +,-#.

()* ()* !"#$

()* ()*

()* %&' +,-#. ()* %&' +,-#. ()* %&' +,-#. ()* %&' +,-#.

%&'

()* %&' +,-#. ()* %&' +,-#.

/001

201

4

Tan, Vijay Vasudevan wer Project

FAWN

Fast Array of Wimpy Nodes

34-5"6"78-, 9&4:&4

Improve computational efficiency of data-intensive computing using an array of well-balanced low-power systems.

+;1< ()* %&' +,-#. ()* %&' +,-#.

()* ()* !"#$

()* ()*

()* %&' +,-#. ()* %&' +,-#. ()* %&' +,-#. ()* %&' +,-#.

%&'

()* %&' +,-#. ()* %&' +,-#.

/001

201

AMD Geode 256MB DRAM 4GB CompactFlash 4

Target: Data-intensive computing

• Large amounts of data • Highly-parallelizable • Fine-grained, independent tasks Workloads amenable to “scale-out” approach

5

Outline • What is FAWN? • Why FAWN? • When FAWN? • Challenges (How FAWN?) 6

Why FAWN? 1. Fixed costs make dynamic power scaling difficult 2. FAWN balances system to save energy 3. FAWN targets sweet-spot in efficiency 4. FAWN reduces peak power consumption

7

1. Fixed power costs dominate

Power (W)

Power (W)

5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0

No DVFS DVFS Ideal

0

20

40 80 60 System Utilization(%)

Figure adapted from Tolia et. al HotPower 08

100

8

1. Fixed power costs dominate

Power (W)

Power (W)

5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0

70% of peak power at 0% utilization!

No DVFS DVFS Ideal

0

20

40 80 60 System Utilization(%)

Figure adapted from Tolia et. al HotPower 08

100

8

1. Fixed power costs dominate

Power (W)

Power (W)

5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0

70% of peak power at 0% utilization!

} 0

Fixed power costs

No DVFS DVFS Ideal

20

40 80 60 System Utilization(%)

Figure adapted from Tolia et. al HotPower 08

100

8

2. Balancing to save energy CPU/Disk Gap 100000000 10000000 CPU-to-Disk seek Speed Ratio 1000000 100000



How do we balance?



Big CPUs clocked down?

• •

Embedded CPUs? Why not use more disks with big CPUs?

10000 1980 1985 1990 1995 2000 2005

Year 9

3. Targeting the sweet-spot in efficiency Speed vs. Efficiency

Instructions/sec/W in millions

2500 2000

XScale 800Mhz

Atom Z500

1500 Custom ARM Mote Xeon7350

1000 500 0 1

10

100 1000 10000 Instructions/sec in millions

100000

10

3. Targeting the sweet-spot in efficiency Speed vs. Efficiency

2500 Instructions/sec/W in millions

Fast processors mask memory wall at the cost of efficiency

2000

XScale 800Mhz

Atom Z500

1500 Custom ARM Mote Xeon7350

1000 500 0 1

10

100 1000 10000 Instructions/sec in millions

100000

10

3. Targeting the sweet-spot in efficiency Speed vs. Efficiency

2500 Instructions/sec/W in millions

Fast processors mask memory wall at the cost of efficiency

2000

XScale 800Mhz

Atom Z500

1500 Custom ARM Mote Xeon7350

1000 500 0 1

10

100 1000 10000 Instructions/sec in millions

100000

10

3. Targeting the sweet-spot in efficiency Speed vs. Efficiency

2500 Instructions/sec/W in millions

Fast processors mask memory wall at the cost of efficiency

2000 XScale 800Mhz

Atom Z500

1500 Xeon7350

1000 500 0

Custom ARM Mote

1

10

100 1000 10000 Instructions/sec in millions

100000

10

3. Targeting the sweet-spot in efficiency Speed vs. Efficiency

2500 Instructions/sec/W in millions

Fast processors mask memory wall at the cost of efficiency

2000 XScale 800Mhz

Atom Z500

1500 Xeon7350

1000 500 0

Custom ARM Mote

1

10

100 1000 10000 Instructions/sec in millions

100000

10

3. Targeting the sweet-spot in efficiency Speed vs. Efficiency

2500 Instructions/sec/W in millions

Fast processors mask memory wall at the cost of efficiency Fixed power costs can dominate efficiency for slow processors

2000 XScale 800Mhz

Atom Z500

1500 Xeon7350

1000 500 0

Custom ARM Mote

1

10

100 1000 10000 Instructions/sec in millions

100000

10

3. Targeting the sweet-spot in efficiency Speed vs. Efficiency

2500 Instructions/sec/W in millions

Fast processors mask memory wall at the cost of efficiency Fixed power costs can dominate efficiency for slow processors

2000 XScale 800Mhz

Atom Z500

1500 Xeon7350

1000 500 0

FAWN targets sweet spot in processor efficiency when including fixed costs

Custom ARM Mote

1

10

100 1000 10000 Instructions/sec in millions

100000

10

4. Reducing peak power consumption

• Provisioning for peak power requires: 1. worst case cooling requirements 2. UPS systems upon power failure 3. power generation and substations investment

11

4. Reducing peak power consumption

• Provisioning for peak power requires: 1. worst case cooling requirements 2. UPS systems upon power failure 3. power generation and substations investment

11

4. Reducing peak power consumption

• Provisioning for peak power requires: 1. worst case cooling requirements 2. UPS systems upon power failure 3. power generation and substations investment

11

4. Reducing peak power consumption

• Provisioning for peak power requires: 1. worst case cooling requirements 2. UPS systems upon power failure 3. power generation and substations investment

11

What is FAWN good for? • • •

Random-access workloads (Key-value Lookup) Scan-bound workloads (Hadoop, Data Analytics) CPU-bound workloads (Compression, Encryption)

12

Important metrics Performance Efficiency

Work time

Perf Watt

Density

Cost

Perf Volume

Perf $

13

Random access workloads FAWN + CF (4W)

Traditional + HD (87W)

Traditional + SSD (83W)

14

Random access workloads

14

Random access workloads FAWN (4W) Traditional + HD (87W) Traditional + SSD (83W) 6000

5800

4500 3000 1500 0

1697 177

Queries/sec

Performance 15

Random access workloads FAWN (4W) Traditional + HD (87W) Traditional + SSD (83W) 6000

5800

500

4500

375

3000

250

1500 0

424.25

125

1697 177

Queries/sec

Performance

0

2.034

69.88

Queries/Joule

Efficiency 15

Random access workloads FAWN is 6-200x more efficient than traditional systems FAWN (4W) Traditional + HD (87W) Traditional + SSD (83W) 6000

5800

500

4500

375

3000

250

1500 0

424.25

125

1697 177

Queries/sec

Performance

0

2.034

69.88

Queries/Joule

Efficiency 15

CPU-bound encryption AES encryption/decryption of a 512MB file with a 256-bit key FAWN (5W) Traditional + HD (87W) 60 45

51.15

30 15 0

3.65 Encryption Speed (MB/s)

Performance 16

CPU-bound encryption AES encryption/decryption of a 512MB file with a 256-bit key FAWN (5W) Traditional + HD (87W) 60 45

0.8 51.15

30

0.73 0.6 0.4 0.365

15 0

0.2 3.65 Encryption Speed (MB/s)

Performance

0

Encryption Efficiency (MB/J)

Efficiency 16

CPU-bound encryption AES encryption/decryption of a FAWN is 2x more efficient for CPU-bound operations! 512MB file with a 256-bit key FAWN (5W) Traditional + HD (87W) 60 45

0.8 51.15

30

0.73 0.6 0.4 0.365

15 0

0.2 3.65 Encryption Speed (MB/s)

Performance

0

Encryption Efficiency (MB/J)

Efficiency 16

When to use FAWN for random access workloads? • •

Total cost of ownership



Capital cost + 3 year power @ $0.10/kWh

What is the cheapest architecture for serving random access workloads?

• •

Traditional + {Disks, SSD, DRAM}? FAWN + {Disks, SSD, DRAM}?

17

%&'&()'!*+,)!+-!./

Architecture with lowest TCO for random access workloads

!$"""" !$""" !$"" !$" !$ !"#$ !"#$

Ratio of query rate to dataset size informs storage technology

0.12*+*,%34

0.12*+*55, / . -

*, + * #) ( %' & % $ # " ! 0.12*+*,-./

!$ !$" !$"" 01)23!4&')!56+77+8-(9():;

!$"""

18

%&'&()'!*+,)!+-!./

Architecture with lowest TCO for random access workloads

!$"""" !$""" !$"" !$" !$ !"#$ !"#$

Ratio of query rate to dataset size informs storage technology

0.12*+*,%34

0.12*+*55, / . -

*, + * #) ( %' & % $ # " ! 0.12*+*,-./

!$ !$" !$"" 01)23!4&')!56+77+8-(9():;

!$"""

18

%&'&()'!*+,)!+-!./

Architecture with lowest TCO for random access workloads

!$"""" !$""" !$"" !$" !$ !"#$ !"#$

Ratio of query rate to dataset size informs storage technology

0.12*+*,%34

0.12*+*55, / . -

*, + * #) ( %' & % $ # " ! 0.12*+*,-./

!$ !$" !$"" 01)23!4&')!56+77+8-(9():;

!$"""

18

%&'&()'!*+,)!+-!./

Architecture with lowest TCO for random access workloads

!$"""" !$""" !$"" !$" !$ !"#$ !"#$

Ratio of query rate to dataset size informs storage technology

0.12*+*,%34

0.12*+*55, / . -

*, + * #) ( %' & % $ # " ! 0.12*+*,-./

!$ !$" !$"" 01)23!4&')!56+77+8-(9():;

!$"""

FAWN-based systems can provide lower cost per {GB, QueryRate} 18

Challenges “Each decimal order of magnitude increase in parallelism requires a major redesign and rewrite of parallel code” - Kathy Yelick

• Algorithms and Architectures at 10x scale • Dealing with Amdahl’s law • High performance using low performance nodes • Today’s software may not run out of the box • Manageability, failures, network design, power cost vs. engineering cost

19

Conclusion •

FAWN improves the computational efficiency of datacenters



Informed by fundamental system power trends



Challenges: programming for 10x scale, running today’s software on yesterday’s machines...

20

Hot enough for industry

21