Dell EMC Isilon Best Practices for Hadoop Data Storage

The Dell EMC® Isilon® scale-out network-attached storage (NAS) platform provides Hadoop clients with direct access to big data through a Hadoop File System ...
272KB Sizes 3 Downloads 625 Views
DELL EMC ISILON BEST PRACTICES FOR HADOOP DATA STORAGE ABSTRACT This white paper describes the best practices for setting up and managing the HDFS service on a Dell EMC Isilon cluster to optimize data storage for Hadoop analytics. October 2016

The information in this publication is provided “as is.” DELL EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any DELL EMC software described in this publication requires an applicable software license. DELL EMC2, DELL EMC, the DELL EMC logo are registered trademarks or trademarks of DELL EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. © Copyright 2016 DELL EMC Corporation. All rights reserved. Published in the USA. <white paper> < H12877> DELL EMC believes the information in this document is accurate as of its publication date. The information is subject to change without notice. DELL EMC is now part of the Dell group of companies.

2

TABLE OF CONTENTS

INTRODUCTION ....................................................................................................................... 5 OVERVIEW OF ISILON SCALE-OUT NAS FOR BIG DATA ............................................................ 5 HOW HADOOP WORKS WITH ISILON SCALE-OUT NAS ............................................................. 6 NAMENODE REDUNDANCY ........................................................................................................... 6 RACK AWARENESS ........................................................................................................................ 7 THE HDFS ARCHITECTURE OF ONEFS........................................................................................ 7

HDFS SETUP ............................................................................................................................ 8 SUPPORTED DISTRIBUTIONS ....................................................................................................... 8 OVERVIEW OF HOW TO INTEGRATE AN ISILON CLUSTER WITH HADOOP ............................ 8 SIZING AN ISILON CLUSTER FOR A HADOOP COMPUTE GRID .............................................. 10 MAXIMIZING NETWORK THROUGHPUT WITH 10 GIGABIT ETHERNET .................................. 10 DO NOT RUN THE NAMENODE AND DATANODE SERVICES ON CLIENTS ............................ 10 OVERVIEW OF ISILON HDFS COMMANDS ................................................................................. 10 THE HDFS SERVICE AND LOGS .................................................................................................. 10 CREATING DIRECTORIES AND SETTING PERMISSIONS ......................................................... 11 SETTING THE ROOT PATH .......................................................................................................... 13 SETTING UP A SMARTCONNECT ZONE FOR NAMENODES .................................................... 13

WORKING WITH DIFFERENT HADOOP DISTRIBUTIONS.................................................. 13 CLOUDERA .................................................................................................................................... 14 PIVOTAL HD .................................................................................................................................. 14

TUNING ONEFS FOR HDFS OPERATIONS ......................................................................... 14 BLOCK SIZES ................................................................................................................................ 14 TUNING THE NUMBER OF HDFS THREADS ............................................................................... 15 OBTAINING STATISTICS TO TUNE ONEFS FOR A HADOOP WORKFLOW ....................