Energy Efficient Scheduling of MapReduce ... - Distributed Systems

Energy Efficient Scheduling of MapReduce Workloads on Heterogeneous Clusters Nezih Yigitbasi Delft University of Technology the Netherlands [email protected]

Kushal Datta, Nilesh Jain, and Theodore Willke Intel Labs Hillsboro, OR {kushal.datta,nilesh.jain,theodore.l.willke}@intel.com

ABSTRACT Energy efficiency has become the center of attention in emerging data center infrastructures as increasing energy costs continue to outgrow all other operating expenditures. In this work we investigate energy aware scheduling heuristics to increase the energy efficiency of MapReduce workloads on heterogeneous Hadoop clusters comprising both low power (wimpy) and high performance (brawny) nodes. We first make a case for heterogeneity by showing that low power Intel Atom processors and high performance Intel Sandy Bridge processors are more energy efficient for I/O bound workloads and CPU bound workloads, respectively. Then we present several energy efficient scheduling heuristics that exploit this heterogeneity and real-time power measurements enabled by modern processor architectures. Through experiments on a 23-node heterogeneous Hadoop cluster we demonstrate up to 27% better energy efficiency with our heuristics compared with the default Hadoop scheduler.

Categories and Subject Descriptors H.3.4 [Systems and Software]: Distributed systems; D.4.8 [Performance]: Measurements

1. INTRODUCTION The power consumption of the data centers is expected to reach unprecedented scales. The EPA estimates that US data centers will consume 100 billion kilowatt hours annually by 2011 at a cost of $7.4 billion per year [8]. Moreover, the annual energy cost has already surpassed the annualized capital cost of servers [4], and is expected to surpass all other operating costs in future deployments. Even worse, the energy problem is exacerbated by the increasing processor and rack densities. Therefore, energy efficiency is becoming a first class citizen in many aspects of modern data centers, and the industry and academia are seeking ways to improve the data center energy efficiency. Researchers have been investigating diverse solutions to this problem such as real-time power monitoring for better power management, operating the systems at optimal efficiency, smarter cool-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GCM’2011, December 12th, 2011, Lisbon, Portugal. Copyright 2011 ACM 978-1-4503-1064-2/11/12 ...$10.00.

ing techniques, and smarter workload placement techniques. This work slowed the growth of energy consumption significantly between 2005 and 2010 [12]. There is an increasing amount of research focused on data center energy efficiency. In general, researchers have addressed energy efficient techniques for consolidation and workload placement [19, 20, 21], and load distribution [15]. And with the increasing interest on MapReduce [7] for data intensive computing, they have endeavored to improve the energy efficiency of MapReduce clusters [16, 13, 11]. Recently the feasibility of deploying low power (wimpy) nodes in clusters was demonstrated [1, 6], and these studies have led to interesting discussions about the use of wimpy nodes in data centers [14, 9]. However, little work has been done for energy efficient scheduling in heterogeneous MapReduce clusters comprised of both wimpy and brawny (high performance) nodes. In this work we fill in this gap by investigating energy efficient scheduling techniques for heterogeneous MapReduce clusters. Toward this end we first make a case for heterogeneity by showing that wimpy Intel Atom processors and brawny Intel Sandy Bridge processors are more energy efficient for I/O bound workloads and CPU bound workloads, respectively. For characterizing the energy efficiency of CPU bound workloads we use the performance per watt (perf/watt) metric where performance is 1/completion time, and for I/O bound workloads we use IOPS/watt as the energy efficiency metric since for I/O bound workloads performance is well characterized by I/O operations per second (IOPS). After making a case for heterogeneity, we investigate several scheduling heuristics for energy efficient execution of MapReduce workloads that exploit the cluster heterogeneity and real-time power measurements enabled with recent processor architectures. Our energy efficient scheduling heuristics take both performance and power into account for scheduling decisions. Moreover, these heuristics address the nontrivial trade off between the conflicting performance and power goals; although running a particular task on a wimpy node may result in lower power consumption, in the end it may take longer for the job to complete resulting in degraded performance and increased overall energy consumption. With this work we make the following contributions. First, we demonstrate a case for heterogeneity by showing that, for I/O bound MapReduce workloads, Atom nodes are around 2.5x more energy efficient than the Sandy Bridge nodes. Second, we propose and evaluate three scheduling heuristics on a heterogeneous Hadoop cluster, and we show that our heuristics are able to provide up to 27% better energy

CPU

Memory

Storage

Intel Core (TM) i7-2600 CPU (Sandy Bridge) w/HT @ 3.4GHz Atom D510 w/HT @ 1.66GHz

8GB

Intel x25E SSD Intel x25E SSD

4GB

CPU TDP [W] 95 13

Table 1: Specifications of the platforms that we use in our experiments. efficiency compared with the default Hadoop scheduler.

2. BACKGROUND: MAPREDUCE AND HADOOP In this section, we briefly describe MapReduce [7], which is a programming model for clusters, and Hadoop [2], which is an open source MapReduce implementation. MapReduce is typically used for processing large amounts of data on commodity clusters. The user specifies a map function that processes a key-value pair to produce a list of intermediate key-value pairs, and a reduce function to aggregate the output of the map function. Hadoop is framework that implements the MapReduce programming model, and simplifies cluster programming by taking care of automatic parallelization, load balancing and fault-tolerance. A typical Hadoop cluster runs over the Hadoop Distributed File System (HDFS) and has a single job tracker (master) that is responsible for managing the task trackers (slaves) running on each node in the cluster. When a user submits a job consisting of map and reduce functions, Hadoop first splits the input data that resides in the HDFS into splits. Then, Hadoop divides the job into several tasks depending on the size of the input data. The job tracker schedules these tasks in response to the heartbeats sent by the task trackers periodically. A single map task is run for every input split producing a list of key-value pairs. Hadoop then partitions the map output based on the keys, and runs a reduce task for each key writing the final output to the HDFS. In this work we use the default scheduling policy of Hadoop as the baseline for our performance evaluation. The default Hadoop scheduler is first in first out (FIFO) with multiple priority levels. On receiving a heartbeat from a task tracker with information about the number of free map/reduce slots, the scheduler scans the job queue in priority order and determines which tasks to assign to this task tracker considering data locality for map tasks to improve the performance.

3. A CASE FOR HETEROGENEITY In this section, we demonstrate a case for heterogeneity through experiments with two different platforms: low power Atom processors, the wimpy processors, and high performance Sandy Bridge (SNB) processors, the brawny processors. Table 1 shows the specifications of these platforms. We show in this section that there is a scheduling opportunity that we can exploit since for I/O bound workloads wimpy nodes provide up to 2.5x better energy efficiency compared with the brawny nodes. As the workload we use three MapReduce applications from the HiBench [10] benchmark suite: word count, sort and nutch. Word count and sort are micro benchmarks that are available in the Hadoop distribution, and nutch is the indexing system of Apache Nutch search engine [3]. These applications are representative of real MapReduce workloads as the computation performed by these applications are common use cases of MapReduce, namely trans-

forming data from one representation to another, extracting a small amount of data from a large data set, and large-scale indexing [10]. It is unfair to compare wimpy node and brawny node clusters of the same size. Because, the size of a wimpy node cluster will be larger than a brawny node cluster either when these clusters are built to provide the same level of performance or they are provisioned the same amount of power. In this work we use the Thermal Design Power (TDP) of the two processors to determine the number of nodes to deploy in the clusters, and in the end the same amount of power is provisioned to both clusters according to their TDP values. As the ratio of the TDP of the platforms is 1:7 we create two different Hadoop clusters for our experiments: an Atom cluster of seven Atom nodes and a SNB ”cluster”(pseudo cluster) of a single SNB node. With the single node SNB cluster we are actually neglecting the impact of workload partitioning and the network performance which are two important concerns for MapReduce workloads. However, even we deploy more than one SNB node and consider these impacts, our results will still hold as our evaluation with a single node SNB cluster actually provides an upper bound for the perf/watt. Because, when we increase the cluster size the perf/watt will get worse since the power consumption does scale linearly while the performance does not. Instead of using the holistic power measured with a power meter at the wall socket we use a linear power model for two reasons. First, some of the hardware components on our experimental platforms, such as the SATA headers, and USB and HDMI ports, add noticeable overheads to the measured holistic power. Second, these overheads on the Atom and SNB nodes are significantly different. Therefore, to model the holistic power for the Atom and SNB nodes we first determine the power consumed by the processor package for a particular workload. Then through micro benchmarks we determine the power consumption of the other components on the platform like the storage devices, main memory, and the network interface. We model the holistic power of a single node as the sum of its package power and the power consumption of the storage devices, network interface and the main memory taking into account an overhead of 5% for the unanticipated power consumption, and a power supply efficiency of 90% for the DC to AC conversion. Since we are interested in both the performance and the power consumption of the system, we characterize the energy efficiency with the perf/watt metric. We use 1/energy and IOPS/watt, which is actually the number of I/O operations performed per joule, metrics for CPU bound and I/O bound workloads, respectively. The cumulative distribution function (CDF) of the power consumption of the two platforms is shown in Figure 1. We use the CDFs to characterize the power consumption profile of both platforms. Each graph shows the power consumption of the processor package and the whole cluster for both platforms. For all workloads, the processor consumes roughly half of the holistic power, confirming the previous studies [18]. Also note that the dynamic power ranges of the two platforms are significantly different: Atom has a narrow dynamic range from 9W to 13W while SNB has a larger dynamic range from 5W to around 150W for the sort and nutch workloads. Since the Atom’s package power consumption also includes the power consumed by other components, such as the chipset, memory, SATA, and gigabit

Atom Cl. Atom Node SNB Cl. SNB Node

0

50 100 150 200 Power Consumption [Watts]

100 90 80 70 60 50 40 30 20 10 0

250

CDF [%]

CDF [%]

CDF [%]

100 90 80 70 60 50 40 30 20 10 0


0


100 90 80 70 60 50 40 30 20 10 0

250


0


250

(a) Word count (b) Sort (c) Nutch Figure 1: Cumulative distribution function (CDF) of the power consumption of the processor package (Atom/SNB Node) and the whole cluster (Atom/SNB Cl.) for both the Atom and the Sandy Bridge (SNB) platforms.

4. SCHEDULING HEURISTICS In the previous section we showed that the SNB nodes are more energy efficient for CPU bound workloads, and the Atom nodes are more energy efficient for I/O bound workloads, thus making a case for heterogeneity. Unfortunately, existing Hadoop schedulers (i.e., the FIFO scheduler, the fair

Job Completion Time [m]

50 40 30 20 10 0

Perf/Watt [Normalized]

Ethernet controller, the idle Atom package power (9W) is slightly greater than the idle SNB package power (5W). The narrow dynamic range of Atom suggests that it consumes a similar amount of power, whether it is idle or not, and it is possible to exploit the low power Atom platform for better energy efficiency, if in the end, the performance is not degraded significantly, which is the case for I/O bound workloads as we show later in this section. Note that although we setup the two clusters based on the ratio of the TDP values (1:7), our measurements reveal that the Atom cluster consumes more power than the SNB cluster; around 1.7x for word count, 2.5x for sort and 2.05x for the nutch workload, respectively. Because, TDP is not a good estimate of the actual power consumption since the actual consumption will vary depending on several conditions such as the temperature and the workload dynamics. However, regardless of this poor estimate our main result still holds; the Atom cluster is more energy efficient for I/O bound workloads because of the significant increase in performance. We then investigate the job completion time and the perf/watt and show the results in Figure 2. The three workloads we use are of different characteristics: word count is a CPU bound workload with a median CPU utilization around 80% on Atom nodes and 60% on SNB nodes, sort is an I/O bound workload, and nutch is a balanced workload with a median CPU utilization of around 50% on both platforms. We see that for the CPU bound word count workload the job completion time on the Atom cluster is around 1.3x of the SNB cluster, and together with the higher power consumption of the Atom cluster the SNB cluster results in 2x better energy efficiency. For the I/O bound sort workload the Atom cluster has signifcantly better job completion time (3.5x) yielding 2.5x better energy efficiency over the SNB cluster, confirming the previous research [6]. This shows that there is a class of workloads that results in better energy efficiency when executed on the Atom nodes providing a scheduling opportunity that we exploit with our scheduling heuristics in the next section. Finally, for the nutch workload the SNB cluster has 1.7x better energy efficiency than the Atom cluster, despite the Atom cluster having slightly better performance (10%), because the power consumption of the Atom cluster is larger than that of the SNB cluster.

SNB Atom Cluster Cluster


Word Count

Sort


Nutch

1 0.8 0.6 0.4 0.2 0




Word Count

Sort

Nutch

Figure 2: Job completion times (top) and the normalized energy efficiency (bottom) for word count, sort, and nutch workloads on all platforms. scheduler and the capacity scheduler) do not consider heterogeneity when making scheduling decisions. Therefore, in this work we fill this gap by proposing several scheduling heuristics that exploit heterogeneity for better energy efficiency. Our scheduling heuristics determine which tasks should run on the wimpy nodes in the cluster, and address the nontrivial trade off between power and performance; although running a particular task on an Atom node may result in lower power consumption it may result in longer job completion times degrading the performance and increasing the total energy consumption significantly. Toward this end, our heuristics characterize the energy efficiency of the nodes in the cluster using the records/joule and IOPS/watt metrics and use these to metrics make scheduling decisions; we describe in Section 5.1 how our schedulers collect these metrics. While designing our heuristics we made two assumptions. First, we assumed that the characteristics of the workload are not known a priori, making our problem an online scheduling problem. Second, we assumed that the cluster is shared by multiple users which is the case for production Hadoop deployments [22], therefore our heuristics considers fairness as an important concern. We describe our scheduling heuristics in turn. Default Scheduler (Default): We use the default Hadoop

Energy Efficient Scheduler with Locality (EESched+Locality): We modified EESched for better locality for the map tasks. A task tracker TT is assigned a task t if TT contains an input split of t and TT is the most energy efficient node for t, where we define the most energy efficient node as before. With this heuristic, a map task is guaranteed to be executed on a node that contains an input split for this task, yielding better locality compared to the other heuristics. Our motivation for evaluating this heuristic is to investigate whether better locality will result in better energy efficiency, as we expect better locality to result in better job completion times. Run Reduce Phase on Wimpy Nodes (RoW): As shown in the previous section, wimpy nodes are more energy efficient for I/O bound workloads. Therefore, the intuition behind this heuristic is that running the whole reduce phase, which is mostly I/O bound, on the wimpy nodes may result in better energy efficiency for the reduce phase, and for the whole workload. Note that RoW only modifies the scheduling of the reduce tasks and it uses Default for scheduling the map tasks.

5. PERFORMANCE EVALUATION 5.1 Experimental Setup We evaluate our heuristics on a heterogeneous Hadoop cluster comprising twenty Intel Atom nodes and three Intel

160

Completion Time [m]

140 120 100 80 60 40 20 0

Normalized Energy Consumption

scheduler (Section 2) as the baseline in our performance evaluation. Energy Efficient Scheduler (EESched): This greedy heuristic schedules a task to the most energy efficient node for that task type, either map or reduce. We define the most energy efficient node as the node with free slots and with the maximum records per joule metric for the map tasks, and with the maximum IOPS/watt for the reduce tasks. Since map tasks mostly involve computation and the reduce tasks mostly involve I/O, we believe that these metrics characterize the energy efficiency well. The intuition behind EESched is that tasks are scheduled to the most energy efficient node in the cluster until this node is not the most energy efficient node any more (i.e., the energy efficiency metrics deviate from their best values), so the heuristic does its best to operate the nodes close to their most energy efficient operating points. When EESched receives a heartbeat from a node it first schedules the map tasks and then the reduce tasks. After determining the most energy efficient node for a task, the heuristic checks whether the current heartbeat is received from the most energy efficient node, and if this is not the case the heuristic does not schedule any tasks to this node. Then, EESched sorts the runnable jobs in the queue by their number of running tasks for fairness, and determines the number of tasks to schedule using the number of free map/reduce slots of the node that sent the current heartbeat. The heuristic then traverses the job queue and schedules the map tasks while taking data locality into consideration; if no local map task is found for a node then it considers rack locality, and if none of these are possible it considers off-rack locality. Note that EESched is a greedy heuristic that makes locally optimal decisions, and these decisions may be far from the globally optimal which we leave as future work.

Default

EESched EESched + Locality

RoW

Default

EESched EESched + Locality

RoW

1 0.8 0.6 0.4 0.2 0

Figure 3: Workload completion time (top) and the normalized energy consumption (bottom) for the various scheduling heuristics. SNB nodes (see Section 3 for the processor specifications). The ratio of the number of Atom nodes to the number of SNB nodes is roughly 1:7 as described in Section 3. All nodes reside on the same rack and are connected by a single gigabit Ethernet switch. Before performing the experiments we have done our best to optimize the performance of our cluster by performing several experiments and carefully tuning the configuration parameters. We have developed the necessary tools to measure the package power on the Atom and SNB nodes. During the experiments our measurement tools run in the background on all nodes and export the package power measurements to be used by the task trackers. We performed our experiments with Hadoop 0.20.0 and we have made several modifications to implement the heuristics described in Section 4. With our modifications, the task trackers collect the package powers using our measurement infrastructure and derive the energy efficiency metrics, records/joule and IOPS/watt, at fixed time intervals. To determine these metrics the task trackers estimate the holistic power using the power model described in Section 3 and the measured package powers. Finally, task trackers send these metrics to the job tracker at every heartbeat used for scheduling decisions. Since production Hadoop clusters are shared (Section 4) we use a mix of workloads in lieu of a single application to assess the heuristics under realistic scenarios. The workload mix comprises 25 jobs and the job inter arrival time follows an exponential distribution with a mean of 14 s [22]. Each job in the mix is randomly picked from the word count, sort and nutch applications described in Section 3, and in the end the workload contains roughly the same number of jobs for each application. Each job in the mix has 15 GB input data to process, and in total the workload comprises around 4900 map tasks and 800 reduce tasks, and it takes roughly 2.5 hours with the Default Hadoop scheduler to process the complete workload.

5.2

Results

We assess the workload completion time and present the

results in Figure 3 (top). First we observe that all heuristics improve the workload completion time compared to Default: EESched by 30%, EESched+Locality by 15%, and RoW by 5%. These results suggest that scheduling the tasks considering the energy efficiency of the nodes (EESched) helps to reduce the total workload completion time. However, when we favor better locality while taking the scheduling decisions (EESched+Locality), we see that the improvement in completion time over Default is less compared to EESched. This shows that the nodes that contain the input data for a map task are not necessarily the best performing node as HDFS replicates the data in a random way. For example, with EESched+Locality, a compute intensive map task may run on a wimpy node which may increase the completion time of the job noticeably. Finally, running the whole reduce phase on the wimpy nodes (RoW) reduces the completion time slightly, due to the improvements in the completion times of the individual reduce tasks; we have already shown that Atom nodes perform better than SNB nodes for I/O bound workloads (Section 3). In Figure 3 (bottom), we evaluate the energy efficiency of our heuristics and show the energy consumption of the heuristics normalized to the Default scheduler. Apparently considering energy efficiency metrics during scheduling and scheduling the tasks to the most energy efficient node (EESched) pays off and improves the total energy consumed by the workload execution significantly (by 27%). The reason is that EESched schedules each type of task (map or reduce) to the most energy efficient node for that particular task type and does its best to operate the nodes close to their most energy efficient operating points. Similar to the results for the workload completion time (Figure 3 (top)) when we favor locality (EESched+Locality), the improvement in the energy consumption over Default is less compared to the EESched (by around 15%); the nodes that contain the input splits for a job are not necessarily the most energy efficient nodes due to the way HDFS replicates data. Therefore, it is definitely an interesting future work to investigate energy efficient replication strategies, and how to couple them with our energy efficient scheduling heuristics. Finally, a very simple modification to the scheduler (RoW) improves the energy efficiency by roughly 20%. The reason for lower energy savings with RoW compared to (EESched) is that RoW only considers energy efficiency for the reduce tasks while (EESched) considers energy efficiency for both map and reduce tasks.

6. RELATED WORK There has been an increasing amount of research on energy efficiency in data centers. In general, several studies have proposed solutions that include turning the machines off [16, 13]. However, this solution raises concerns about the availability of the replicated data, and the problem of which machines to turn off is more difficult for heterogeneous clusters. Similarly, instead of turning servers off putting them to low power states has also been investigated in previous work [17]. Recently the feasibility of using wimpy nodes was shown [1, 6]. However, these studies led to interesting discussions [14, 9] arguing that other concerns, such as the additional software/hardware costs introduced by using a large number of wimpy nodes and requirements of latency sensitive workloads, should also be taken into account.

Closest to our work, several studies have investigated energy efficient workload placement. In [21] authors present eDryad, which learns the workload characteristics and places the jobs with complementary resource requirements on the same node, in the end demonstrating lower energy consumption than the default Dryad scheduler. In [15] authors address load distribution across several data centers. They propose several policies including an optimization approach for managing the energy consumption and the cost of the Internet services while satisfying the service level agreements. Similar to these studies in [19] authors propose a bin packing heuristic for energy aware workload consolidation that maximizes the sum of the Euclidean distances of the current placement to the optimal point at each server. Finally, in [20] the authors investigate the problem of power aware workload placement on heterogeneous virtual clusters. They propose and evaluate a bin packing heuristic that considers both power and migration cost constraints, and they demonstrate the efficacy of their solution both with theoretical and experimental analyses. There are also several studies for improving the energy efficiency of the HDFS. GreenHDFS [11] partitions the servers into cold and hot zones based on the popularity of the files, and saves energy by putting the servers in the cold zone to the low power mode. Similarly, in [16, 13] the authors propose a covering set strategy that replicates data blocks on a subset of the servers and powers down the remaining servers to save energy, and another strategy that uses all the nodes in the cluster to execute the workload and then powers down the entire cluster. These studies have shown dramatic improvements in energy at the file system level and are orthogonal to the scheduling heuristics that we propose in this work. Our work is different from the previous work on energy efficient scheduling in two ways. First, to the best of our knowledge none of the previous work exploits the heterogeneity of MapReduce clusters comprising both wimpy and brawny nodes and the real-time power measurements in the scheduler for improving the energy efficiency. Second, the closest previous work either investigates offline scheduling with complete workload information [19] or assumes that the same set of jobs are executed repeatedly in the cluster and exploit this fact to determine the workload requirements during runtime [21]. However, our assumptions are more realistic: we address the online version of the energy efficient scheduling problem where the characteristics of the workload are not known a priori, and we evaluate our heuristics with a workload mix to emulate production MapReduce workloads as we assume that the cluster is shared by multiple users.

7.

DISCUSSION AND FUTURE WORK

In this work, we started investigating whether energy aware scheduling strategies that exploit the heterogeneity in MapReduce clusters comprising wimpy nodes, such as the Intel Atom processors, and brawny nodes, such as the Intel SNB processors, can improve energy efficiency. Toward this end, in Section 3 we first characterized the performance and energy efficiency of various MapReduce workloads on both Atom and SNB nodes and showed that wimpy nodes are more energy efficient for I/O bound workloads. Then in Section 5 we have shown that it is possible to exploit heterogeneity and real-time power measurements in the scheduler, and we have demonstrated up to 27% improvements

in energy efficiency over the default Hadoop scheduler with simple scheduling heuristics. Although our work and several previous studies [1, 6] demonstrate the feasibility of using wimpy nodes for improving the energy efficiency, there are also other concerns that should be taken into account when deploying wimpy nodes in a data center [9, 14]. These concerns include system administration, hardware and software development costs, fault tolerance for clusters comprising a large number of wimpy nodes, and wimpy nodes not being suitable for latency sensitive workloads where the software infrastructure has to be carefully optimized for the wimpy nodes to guarantee the Service Level Agreements (SLA). Our work raises the following interesting research questions that we plan to address in our future work. Degree of heterogeneity in a cluster How will our scheduling heuristics perform in a larger cluster where the ratio of wimpy to brawny nodes is considerably different from 1:7? What will be the effect on performance and energy efficiency if more than two types of platforms are deployed in the cluster? Moreover, given a fixed power or cost budget, what is the optimal number of nodes of different types that should be deployed in a cluster such that the overall energy efficiency is maximized? Impact of different mixes of workloads How will our heuristics perform with different workloads, such as mixes where all tasks are I/O or CPU bound or other mixes in between? Impact of replication Our results show that the heuristic favoring data locality (EESched+Locality) was less energy efficient than the other heuristics probably because Hadoop does not consider energy efficiency of the nodes during replication. What will be the impact of an energy efficient replication strategy on the resulting performance and energy consumption? Recent studies [11, 16, 13] have already proposed solutions at the HDFS level that result in better energy efficiency. Straggler tasks How can the scheduler detect straggler tasks in a heterogeneous cluster and handle these straggler tasks while still meeting the SLAs? Recent work [23] has already investigated techniques for detecting straggler tasks in heterogeneous clusters. SLAs Data processing frameworks, such as Bigtable [5] and HBase, are already being used for real-time data processing. Therefore, a common use case includes latency requirements for the jobs submitted by different users. How can we design a scheduler that improves the overall energy efficiency while fulfilling these SLAs?

8. REFERENCES [1] D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. Fawn: a fast array of wimpy nodes. In SOSP, pages 1–14, 2009. [2] ”Apache Hadoop Project”. http://hadoop.apache.org/. [3] ”Apache Nutch Project”. http://nutch.apache.org/. [4] C. Belady. In the data center, power and cooling costs more than the it equipment it supports, February 2007. [5] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In OSDI, pages 15–15, 2006.

[6] B.-G. Chun, G. Iannaccone, G. Iannaccone, R. Katz, G. Lee, and L. Niccolini. An energy case for hybrid datacenters. SIGOPS Oper. Syst. Rev., 44:76–80, March 2010. [7] J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107–113, January 2008. [8] U. EPA. Report to congress on server and data center energy efficiency, August 2007. U.S. Environmental Protection Agency, Tech. Rep. [9] U. H¨ olzle. Brawny cores still beat wimpy cores, most of the time. research.google.com/pubs/archive/36448.pdf. [10] S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Intl. Conference on Data Engineering Workshops, pages 41 –51, march 2010. [11] R. T. Kaushik and M. Bhandarkar. Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In HotPower, pages 1–9, 2010. [12] J. Koomey. Growth in data center electricity use 2005 to 2010, August 2011. [13] W. Lang and J. M. Patel. Energy management for mapreduce clusters. VLDB Endow., 3:129–139, September 2010. [14] W. Lang, J. M. Patel, and S. Shankar. Wimpy node clusters: what about non-wimpy workloads? In Intl Workshop on Data Management on New Hardware, pages 47–55, 2010. [15] K. Le, R. Bianchini, M. Martonosi, and T. D. Nguyen. Cost- and energy-aware load distribution across data centers. In HotPower, 2009. [16] J. Leverich and C. Kozyrakis. On the energy (in)efficiency of hadoop clusters. SIGOPS Oper. Syst. Rev., 44:61–65, March 2010. [17] D. Meisner, B. T. Gold, and T. F. Wenisch. Powernap: eliminating server idle power. SIGPLAN Not., 44:205–216, March 2009. [18] S. Pelley, D. Meisner, T. Wenisch, and J. VanGilder. Understanding and abstracting total data center power. In Workshop on Energy Efficient Design, 2009. [19] S. Srikantaiah, A. Kansal, and F. Zhao. Energy aware consolidation for cloud computing. In HotPower, pages 10–10, 2008. [20] A. Verma, P. Ahuja, and A. Neogi. pmapper: power and migration cost aware application placement in virtualized systems. In Middleware, pages 243–264, 2008. [21] W. Xiong and A. Kansal. Energy efficient data intensive distributed computing. IEEE Data Eng. Bull., 34(1):24–33, 2011. [22] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, pages 265–278, 2010. [23] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. OSDI, pages 29–42, 2008.