SQL on Apache Hadoop benchmarks Apache Hive LLAP and Kognitio ...

SQL on Hadoop benchmarks using TPC-DS query set: Hive LLAP & Kognitio ... We deployed the EDW-Analytics option in HDC with Apache Hive 2 LLAP ...
187KB Sizes 2 Downloads 171 Views
Technical Information Sheet

SQL on Apache Hadoop benchmarks Apache Hive LLAP and Kognitio 8.2 This document supports the “SQL on Apache Hadoop benchmarks – Apache Hive LLAP and Kognitio 8.2” whitepaper. 1 It contains the following information: • • •

Benchmark Architecture – details of the 9 node AWS system used in all the benchmarks Benchmark Deployment – Overview and schematic of the benchmarks for each platform: Hive LLAP and Kognitio Individual Query Timings – The query timings for each of the 99 TPC-DS queries on each of the platforms

Benchmark Architecture The 2 benchmarks: Apache Hive LLAP and Kognitio were executed on the same 9 node system. The hardware utilised were standard AWS instances. The infrastructure was deployed using Hortonworks Data Cloud available on Amazon Marketplace. This allows you to select a Hortonworks deployment from a list of options. Details of how to deploy HDC can be found at: https://hortonworks.github.io/hdp-aws/index.html#get-started For the benchmarks one m3.2xlarge was deployed as the edge node along with eight r3.8xlarge data nodes. Each data node has the following specifications: • • •

640GB available disk 244GB RAM. 32 cores

We deployed the EDW-Analytics option in HDC with Apache Hive 2 LLAP automatically deployed so that we did not have to do any set-up or configuration.

Benchmark Deployment Each of the benchmarks was run on the system with the other system stopped. This allowed the platform to utilise all of the available resources available to it during the benchmark. In all cases the 1TB TPC-DS data set was generated using the data generator (dsdgen) provided as part of the TPC-DS benchmarking tool suite. In all cases the TPC-DS query generation tool (dsqgen) was utilised to generate the queries. This tool generates a script for each query stream that randomises the order of the 99 queries in each script. The tool is also designed to insert randomised values for parameters in each of the queries. This ensures the benchmark is a truly mixed workload. For more details on how this tool works see http://www.tpc.org/tpcds/. Small syntax changes were done such as adding aliases for derived tables, renaming columns, renaming group by and sort by columns and editing when reserved words used but query rewriting was not allowed.

You can download the whitepaper from: https://kognitio.com/resources/whitepapers/hive-llap-kognitio-benchmarking-usingtpc-ds-query-set/ 1

Published August 2017

1

SQL on Hadoop benchmarks using TPC-DS query set: Hive LLAP & Kognitio

Kognitio

Notes: • • • • • •

Kognitio version 8.2.0-rel20170616 was used This is the current version available for download at http://kognitio.com/on-hadoop/ . Kognitio is a standard YARN application deployed from the edge node. Data was held in Kognitio RAM view images. The larger data sets were hashed on the columns most commonly used in the joins. These reside within the Kognitio YARN containers and can be utilised by multiple queries. Kognitio statistics were collected on all views. Queries were submitted from the edge node using the Kognitio command line tool wxsubmit for each of the randomised query streams in the benchmark. Each query is executed within all containers in the remaining RAM available (not utilised by view images)

Published August 2017

2

SQL on Hadoop benchmarks using TPC-DS query set: Hive LLAP & Kognitio

Apache Hive LLAP Notes: • •

• • •

Apache Hive2 LLAP was deployed with Hive Version 1.2.1. This was shipped as part of HDP2.6 (cloud) used in the Hortonworks Data Cloud deployment Hive LLAP was set-up and configured automatically when selecting the EDW-Analytics option. The only changes made to configuration were to allow 10 concurrent queries. This was done in Ambari and all recommended changes to the underlying configuration resulting from this change were accepted. Hiv