Everything you need to know about object storage

11 downloads 251 Views 1MB Size Report
shipping data to cheap storage locations. To- day, entire applications can be migrated to, run from and backed up to and
STORAGE Home

MANAGING THE INFORMATION THAT DRIVES THE ENTERPRISE

Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future

Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup

SNAPSHOT 1

EDITOR’S NOTE / CASTAGNA

Need for more capacity driving most NAS purchases

Data storage growing faster than capacity … for now

CLOUD BACKUP

STORAGE REVOLUTION / TOIGO

Tap the cloud to ease backup burdens

Hyper-consolidated is the future

SNAPSHOT 2

HOT SPOTS / SINCLAIR

Users favor venerable NFS and 10 Gig for new NAS systems

When simple storage isn’t so simple

HADOOP

READ-WRITE / MATCHETT

Big data requires big storage

The sun rises on transformational cloud storage services

Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple

Everything you need to know about object storage

Matchett: Say hi to the transformational cloud About us

Object storage is moving out of the cloud to nudge NAS out of enterprise file storage

MAY 2016, VOL. 15, NO. 3 STORAGE • MAY 2016

1

EDITOR’S LETTER RICH CASTAGNA Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases

Data storage growing faster than capacity Future storage media may store petabytes for eons, but the capacity battle is happening now.

Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

JUST ABOUT A year

ago, I got all jazzed about a new disco.ery that could turn DNA into a storage medium. That was pretty cool—the stuff of life getting turned into the equi.alent of a 21st century floppy disk, but with a lot more capacity and e.en more intriguing scenarios. Scientists speculated that a mere gram of DNA could potentially hold 2 TB of data. Not bad for a medium that until now only handled mundane tasks like determining the color of our eyes and whether or not we were going to lose our hair e.entually. Now the lab coat set has come up with another ingenious alternati.e to boring old magnetic data storage media or stolid solid state. In news reported by a number of sources, including Research and Markets, the arri.al of “quartz coin data storage” has been widely heralded.

Not to be confused with bitcoins, each of these coins (de.eloped at the Uni.ersity of Southampton in the UK) is said to be capable of holding 360 TB of data on a disk that appears to be only a bit larger than a quarter. That’s a lot of data in a small space, and it makes that 32 GB USB flash dri.e on your keychain seem pretty puny. According to the Research and Markets press release, here’s how they did it: “The femtosecond laser encodes information in five dimensions. The first three dimensions of X, Y and Z denote the nano-grating’s location within the coin, while the 4th and 5th dimension exist on the birefringence patterns of the nano-gratings.” Well, sure, if you’re going to resort to using birefringence patterns and nanogratings, then that kind of capacity in that small of a space is no surprise. (Confession: I ha.e no idea what birefringence patterns and nanogratings are.) All those terabytes on a small coin are pretty impressi.e, but what’s e.en more staggering is the predicted life expectancy of this form of data storage media: 13.8 billion years. That’s about the current age of the uni.erse! Sure, that number is theoretical, but e.en if it’s only a measly one billion years, I’m impressed. I just hope I’m around to check on the accuracy of that forecast. STORAGE • MAY 2016

3

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

If you’re starting to think about reconfiguring your data center to house a lot of little quartz coins, you might want to slow down a bit. I don’t expect WD or Seagate are likely to be stamping out quartz coin dri.es any time soon. But maybe you don’t really need all that extra capacity anyway.

same—or better—performance can be achie.ed with far fewer flash de.ices. So the net-net there is better performance and a smaller capacity purchase. Tape—you know, that type of data storage media that’s been declared dead umpteen times in the past decade— could also be making a difference due to its generous storage capacities absorbing some of the capacity pre.iously destined for disk.

HDD CAPACITIES ON THE DECLINE

Some market research firms ha.e reported that the total hard disk capacity recently sold is down from pre.ious periods. For example, TrendFocus noted that total HDD capacity shipped declined sharply from about mid-2014 to the middle of 2015. If you think that cloud and flash are responsible for that dip and are filling the gap, that doesn’t appear to be entirely true. What’s more likely is that storage managers are finally trying to get the upper hand in the struggle to control data storage media capacity. And with new capacity requirements looming with big data analytics and IoT apps, it’s kind of a now-or-ne.er proposition: Take control or take co.er. Our storage purchasing sur.eys re.eal that storage buyers ha.e been better at planning ahead o.er the past six or se.en years by buying bigger arrays and then filling in with additional dri.es as needed. Similarly, while flash is likely taking up all the slack of slumping hard disk dri.e (HDD) sales, it’s also likely that solid state is affecting storage buying habits. In the past, to squeeze out required performance, a storage array might’.e been configured with dozens of short-stroked 15K rpm disks, but now the

CLEAR OUT ROTTEN STORAGE

Veritas’ recent (and oddly named) Global Databerg Report declared that only 15% of the data a typical company has stashed away on its storage systems is business-critical stuff. And the report called 52% of the stored information “dark data” because its .alue was unknown. The rest of the data (33%) is described as “redundant, obsolete or tri.ial,” or ROT—definitely one of the best acronyms I’.e seen in some time. I don’t really know how accurate Veritas’ numbers are, but I bet that they’re at least in the ballpark for most companies. In our most recent purchasing sur.ey, respondents indicated that they manage an a.erage of 1.4 petabytes of data on all forms of data storage media, including disks, tape, flash, cloud, optical and whate.er else you can lay a few bits and bytes on. If 33% of that is indeed ROT, that means companies are paying for the care and feeding of about half a petabyte of junk. Perhaps data centers ha.e begun to get the ROT out. Some of the newer and smarter storage systems make the chore of finding and deleting rotten data easier by STORAGE • MAY 2016

4

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS

pro.iding more intelligence about the stored information. It’s also possible that tape, our fa.orite dead data storage media, has been resurrected for archi.al purposes, and both dark data and ROT ha.e found a final resting place. The LTO Ultrium Consortium, which effecti.ely produces the only tape game in town with its LTO format, reported that .endors shipped 78,000 PB of compressed LTO tape capacity in 2015. In any e.ent, when you put the e.er-growing capacity

demands and the apparent downturn in disk capacity sales side by side, you ha.e to conclude that we’re all getting at least a little better at what we sa.e and what we dump. And we’d better get used to it—it looks like selecti.e sa.ing will be the way of life if we want to sur.i.e the imminent big data/IoT data deluge. n

RICH CASTAGNA

is TechTarget’s VP of Editorial.

Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

STORAGE • MAY 2016

5

STORAGE REVOLUTION JON TOIGO Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases

Hyperconsolidated is the future IBM mainframe illuminates a path toward more manageable and cost-efficient hyper-consolidated IT infrastructure.

Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

with IT folks confirm and reinforce the trend away from the industry darling “hyper-con.erged infrastructure” (and, by extension, hyper-con.erged appliances) meme of the past 18 months or so toward something best described as “hyper-consolidation.” A friend of mine in Frankfurt, Germany—Christian Marczinke, .ice president of solution architecture at DataCore Software—coined the term (to gi.e credit where credit is due). But I first heard about hyper-consolidation when I inter.iewed IBM executi.es at the IBM Interconnect 2016 conference in Las Vegas in February. At that e.ent, IBM introduced its latest mainframe, the z13s, which is designed to host not only the legacy databases that are the mainstay of mainframe computing, but MANY RECENT CONVERSATIONS

also so-called “systems of insight” (read: big data analytics systems) traditionally deployed on sprawling, mostly Linux-based cluster farms in which indi.idual nodes consist of a ser.er and some internal or locally-attached storage. IBM made a pretty compelling case that all of those x86 ser.ers could be consolidated into VMs (or KVMs) running on its new big iron platform. Through the combination of a lower cost and more Linux-friendly mainframe with the integration of lots of open source technologies already belo.ed by big data- and Internet of Things-philes, the resulting hyper-consolidated infrastructure would cost companies less money to operate than hyper-con.erged appliances and facilitate their transition into the realm of hybrid clouds. But it would do all of this without the unknowns and insecurities that typically accompany cloudification.

DISAPPOINTMENTS OF CONSOLIDATIONS PAST

Back in 2005, leading analysts actually produced charts suggesting that hyper.isor computing would enable such high le.els of ser.er-infrastructure consolidation that by 2009 Capex spending would all but flat line, while significant Opex cost reductions would start to be seen in e.ery data center. It ne.er happened. Then 2009 came and went and Capex cost kept right on growing, in part because leading hyper.isor .endors STORAGE • MAY 2016

6

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

blamed legacy storage (SANs, NAS and so on) for subpar .irtual machine performance, telling their users that they needed to rip and replace all storage in fa.or of direct-attached storage cobbles. O.er time, this idea morphed into software-defined storage, which .endors also touted as “new,” but was actually a re-.isitation of System-Managed Storage from much earlier mainframe computing days. After a few false starts, the industry productized SDS as hyper-con.erged infrastructure (HCI) appliances. Since then, the trade press has been filled with ad.ertorials subsidized by ser.er .endors-qua-HCI appliance .endors talking about their “HCI cluster nodes,” combining commodity ser.er and storage stuff with lots of hyper.isor and SDS software licenses, as though they were the new normal in agile data center architecture and the Lego™ building blocks of clouds. Only, deploying a bunch of little ser.er nodes—as IBM’s z13s-play suggests—is not really consolidating much of anything. Nor is it really reducing much cost. E.en as impro.ements are made in orchestration and administration of such infrastructures, the result has been a return to the isolated-island-of-data problem that companies sought to address in the late 1990s with SANs.

nials who don’t know symmetrical multiprocessing from Shinola, they ha.e ladled on (in the form of an architecture called LinuxONE) support for all of the Linux distributors and open source appde. tools, cloudware, analytics engines and in-memory databases that they could lay their hands on. The idea is to consolidate a lot of x86 platform workloads and storage into the more robust, reliable and secure compute and storage of the z Systems ecosystem. IBM in.estment protects the system through “elastic pricing,” which means you can get your money back if not satisfied after the first year. (Interestingly, the presentations I saw at the IBM conference pointed out that users were realizing superior ROI to either cloud computing or x86 computing models after only three years with IBM’s mainframe platform.) All in all, though, it is clear that IBM’s idea has a lot of appeal—both to legacy “systems of record” managers (the o.erseers of traditional ERP, MRP, CRM and other workloads who like the reliability and security of the mainframe) and the appde.ers and cloudies who prize Agile de.elopment o.er all else.

BIG IRON ISN’T FOR EVERYONE ENTER HYPER-CONSOLIDATION

One way to clean up this mess with hyper-con.erged appliances is to return to the mainframe. IBM facilitates this with a hardware platform, the z13s, rooted deeply in multiprocessor architecture and engineered for application multi-tenancy. And to make it palatable to Millen-

Now, you don’t need to use a mainframe to do hyper-consolidation. Not e.ery company has the skills on staff to run a mainframe, or the coin to finance the acquisition of big iron—with or without elastic pricing. Marczinke notes, for instance, that his clients are simply looking for real sa.ings from consolidation that hyper.isor .endors promised but didn’t deli.er. He may be right. STORAGE • MAY 2016

7

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup

They are just as interested in hyper-consolidation, but want to remain in the aegis of the x86 hardware and hyper.isor software technologies they’re more familiar with. These folks need something else: not just a sprawling infrastructure comprising hyper-con.erged appliances that are each a data silo with a particular hyper.isor .endor’s moat and stockade surrounding their workload and data. They want to embrace consolidated storage—call it something other than SANs if you want—so it is less costly to manage, and they want to use locally attached storage where that works. But they want all of that to be manageable from a single pane of glass by someone who knows little or nothing about storage. Thankfully, some cool things are coming down the pike in the realm of hyper-consolidation. If I am reading

the tea lea.es correctly, what DataCore has already done with Adapti.e Parallel I/O on indi.idual hyper-con.erged appliances, for example, could .ery well be on its way to becoming much more scalable, creating—.ia software—a mechanism to deli.er application performance and latency reduction at the cluster level. Think of it as “nostall analytics for the rest of us.” Ultimately, this kind of hyper-consolidation may be a winner across a .ery broad swath of organizations that aren’t willing to outsource their futures to the cloud and can’t stretch budgets to embrace IBM’s most excellent z Systems platform. n is a 30-year IT veteran, CEO and managing principal of Toigo Partners International, and chairman of the Data Management Institute. JON WILLIAM TOIGO

Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

STORAGE • MAY 2016

8

OBJECT STORAGE

OBJECT STORAGE IS the

What you need to know about object storage A mainstay of cloud services, object storage is nudging NAS out of enterprise file storage. BY JACOB GSOEDL

VASABII/FOTOLIA

latest alternati.e to traditional filebased storage, offering greater scalability and (potentially) better data management with its extended metadata. Until recently, howe.er, object storage has largely been a niche technology for enterprises while simultaneously becoming one of the basic underpinnings of cloud storage. Rapid data growth, the proliferation of big data lakes in the enterprise, an increased demand for pri.ate and hybrid cloud storage and a growing need for programmable and scalable storage infrastructure are pulling object storage technology from its niche existence to the mainstream as a sound alternati.e to file-based storage. An expanding list of object storage products, from both major storage .endors and startups, is another indication of object storage’s increasing rele.ance. Moreo.er, object storage technology is reaching into network attached storage (NAS) use cases, with some object storage .endors positioning their products as .iable network attached storage alternati.es. To accomplish the lofty goal of o.ercoming the limitations of traditional file- and block-le.el storage systems to reliably and cost-effecti.ely support massi.e amounts of data, object storage systems focus on and break new ground when it comes to scalability, resiliency, accessibility, security and manageability. Let’s examine how object storage systems do this.

HOME STORAGE • MAY 2016

9

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

SCALABILITY IS KEY TO OBJECT STORAGE

Complexity is anathema to extreme scalability. Object storage systems employ se.eral techniques that are simple in nature but essential to achie.ing unprecedented le.els of scale. To start with, object storage systems are scale-out systems that scale capacity, processing and networking resources horizontally by adding nodes. While some object storage products implement self-contained multifunction nodes that perform access, storage and control tasks in a single node, others consist of specialized node types. For instance, IBM Cle.ersafe, OpenStack Swift and Red Hat Ceph Storage differentiate between access and storage nodes; con.ersely, each node in Caringo Swarm 8 and EMC Elastic Cloud Storage (ECS) performs all object storage functions. Unlike the hierarchical structure of file-le.el storage, object storage systems are flat, with a single namespace in which objects are addressed .ia unique object identifiers, thereby enabling unprecedented scale. “With 1038 object IDs a.ailable per .ault, we support a yottabyte-scale namespace, and with each object segmented into 4 MB segments, our largest deployments today are north of 100 petabytes of capacity, and we are prepared to scale to and beyond exabyte-le.el capacity,” according to Russ Kennedy, IBM senior .ice president product strategy, Cle.ersafe. Furthermore, object storage .endors are quick to note their systems substitute the locking requirements of filele.el storage to pre.ent multiple concurrent updates (with .ersioning of objects on update) enabling capabilities like

rollback and undeleting of objects as well as the inherent ability to access prior object .ersions. Finally, object storage systems replace the limited and rigid file system attributes of file-le.el storage with rich customizable metadata that not only capture common object characteristics but can also hold application-specific information.

OBJECT OFFERS GREATER RESILIENCY

Traditional block- and file-le.el storage systems are stymied by fundamental limitations to support massi.e capacity. A case in point is data protection. It’s simply unrealistic to back up hundreds of petabytes of data. Object systems are designed to not require backups; instead, they store data with sufficient redundancy so that data is ne.er lost, e.en while multiple components of the object storage infrastructure are failing. Keeping multiple replicas of objects is one way of achie.ing this. On the downside, replication is capacityintensi.e. For instance, maintaining six replicas requires six times the capacity of the protected data. As a result, object storage systems support the more efficient erasure coding data protection method in addition to replication. In simple terms, erasure coding uses ad.anced math to create additional information that allows for recreating data from a subset of the original data, analogous to RAID 5’s ability to retrie.e the original data from the remaining dri.es despite one failing dri.e. The degree of resiliency is typically configurable in contemporary object storage systems. The higher the le.el of resiliency, the more storage is required. STORAGE • MAY 2016

10

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future

Erasure coding sa.es capacity but impacts performance, especially if erasure coding is performed across geographically dispersed nodes. “Although we support geographic erasure coding, performing erasure coding within a data center, but using replication between data centers is often the best capacity/performance tradeoff,” said Paul Turner, chief marketing officer at Cloudian. With large objects

yielding the biggest erasure coding payback, some object storage .endors recommend data protection policies based on object size. EMC ECS uses erasure coding locally and replication between data centers, but combines replication with data reduction, achie.ing an o.erall data reduction ratio similar to that of geo-dispersed erasure coding without the performance penalty of the latter.

Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup

Object-storage use cases n

Backup and archival: Object storage systems are

ingested into the object store via a message broker

cost-effective, highly scalable backup and archival

like Apache Kafka.

platforms, especially if data needs to be available for continuous access.

n

Big data: Several object storage products offer certified S3 Hadoop Distributed File System interfaces

n

Enterprise collaboration: Geographically distributed

that allow Hadoop to directly access data on the

object storage systems are used as collaboration

object store.

platforms where content is accessed and shared

Sinclair: Simple storage not so simple

across the globe.

n

Content distribution network: Used to globally distribute content like movies using policies to govern

n

Matchett: Say hi to the transformational cloud About us

Log storage: Used to capture massive amounts of log data generated by devices and applications,

Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage

n

n

Storage as a service: Object storage powers private

access with features like automatic object deletion

and public clouds of enterprises and service providers.

based on expiration dates.

Content repositories: Used as content repositories

n

Network Attached Storage (NAS): Used in lieu of

for images, videos and other content accessed

dedicated NAS systems, especially if there is another

through applications or via file system protocols.

use case that requires an object storage system. –J.G.

STORAGE • MAY 2016

11

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future

The ability to detect and, if possible, correct object storage issues is pertinent for a large, geographically dispersed storage system. Continuous monitoring of storage nodes, automatic relocation of affected data, and the ability to self-heal and self-correct without human inter.ention are critical capabilities to pre.ent data loss and ensure continuous a.ailability.

Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

STANDARDS-BASED ACCESSIBILITY

Object storage is accessed .ia a HTTP RESTful API to perform the .arious storage functions, with each product implementing its own proprietary APIs. All object storage products also support the Amazon Simple Storage Ser.ice (S3) API, which has become the de facto object storage API standard—with by far the largest number of applications using it. It also has extensi.e and beyond simple PUT, GET and DELETE operations and supports complex storage operations. The one thing to be aware of, though, is that most object storage .endors only support an S3 API subset, and understanding the S3 API implementation limitations is critical to ensure wide application support. Besides Amazon S3, most object storage .endors also support the OpenStack Swift API. File system protocol support is common in object storage systems, but implementations .ary by product. For instance, EMC ECS has geo-distributed acti.e/acti.e NFS support; and with ECS’ consistency support, it’s a pretty strong geo-distributed NAS product. Scality claims EMC Isilon-le.el NAS performance, and the NetApp

StorageGRID Webscale now offers protocol duality by ha.ing a one-to-one relationship between objects and files. Other object storage products pro.ide file system support through their own or third-party cloud storage gateways like the ones offered by A.ere, CTERA Networks, Nasuni and Panzura. Both Caringo Swarm and EMC ECS offer Hadoop HDFS interfaces, allowing Hadoop to directly access data in their object stores. HGST Amplidata and Cloudian pro.ide S3-compliant connectors that enable Apache Spark and Hadoop to use object storage as a storage alternati.e to HDFS.

ENCRYPTION PROVIDES NEEDED SECURITY

A common use case of an object storage product by ser.ice pro.iders is public cloud storage. Although at-rest and in-transit encryption are a good practice for all use cases, encryption is a must for public cloud storage. The majority of object storage products support both at-rest and in-transit encryption, using a low-touch at-rest encryption approach where encryption keys are generated dynamically and stored in the .icinity of encrypted objects without the need for a separate key management system. Cloudian HyperStore and HGST Amplidata support client-managed encryption keys in addition to ser.er-side managed encryption keys, gi.ing cloud ser.ice pro.iders an option to allow their customers to manage their own keys. Caringo Swarm, the DDN WOS Object Storage platform and Scality RING currently don’t support at-rest (Continued on page 14) STORAGE • MAY 2016

12

Home

Leading object storage products

Castagna: Big storage cooking up in labs

PRODUCT

DELIVERY OPTIONS

NOTABLE FACTS

Caringo Swarm 8

Software-defined; delivered softwareonly to run on commodity hardware.

n

Out-of-box integration with Elasticsearch for fast object search

Toigo: Hyperconsolidation the future

IBM Cleversafe

Software-defined; delivered softwareonly to run on certified hardware.

n

Multi-tiered architecture with no centralized servers Extreme scalability enabled by peer-to-peer communication of storage nodes

Object storage squeezing out NAS

Cloudian HyperStore

Software-defined; delivered as turnkey integrated appliance or software-only to run on commodity hardware.

n

Stores metadata with objects, but also in a distributed NoSQL Cassandra database for speed

Snapshot 1: Capacity driving most NAS purchases

DDN WOS Object Storage platform

Software-defined; delivered as turnkey integrated appliance or software-only to run on commodity hardware.

n

Configurations start as small as one node Able to scale to hundreds of petabytes

EMC Elastic Cloud Storage (ECS)

Software-defined; delivered as turnkey integrated appliance or software-only to run on commodity hardware.

n

Hitachi Content Platform (HCP)

Software-defined; delivered as turnkey integrated appliance or software-only to run on commodity hardware or as a managed service hosted by HDS.

n

HGST Amplidata

Software-defined; delivered as a turnkey rack-level system.

n

Uses HGST Helium filled hard drives to maximize power efficiency, reliability and capacity

NetApp StorageGRID Webscale

Software-defined; delivered as software appliance or turnkey integrated appliance.

n

Stores metadata, including the physical location of objects, in a distributed NoSQL Cassandra database

Matchett: Say hi to the transformational cloud

Red Hat Ceph

Software-defined; delivered softwareonly to run on commodity hardware.

n

About us

Scality RING

Software-defined; delivered softwareonly to run on commodity hardware.

n

Software-defined; delivered softwareonly to run on commodity hardware.

n

Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple

SwiftStack Object Storage System

n

n

Features highly efficient strong consistency on access of geo-distributed objects n Designed with geo-distribution in mind Extreme density with a single cluster able to support up to 800 million objects and 497 PB of addressable capacity n An integrated portfolio: HCP cloud storage; HCP Anywhere File Sync & Share; Hitachi Data Ingestor (HDI) for remote and branch offices

Based on the open-source Reliable Autonomic Distributed Object Store (RADOS) n Features strong consistency on write

n

n

Stores metadata in a custom-developed distributed database Claims EMC Isilon performance if used as NAS Based on OpenStack Swift Enterprise offering of Swift with cluster and management tools and 24-7 support.

STORAGE • MAY 2016

13

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

(Continued from page 12) encryption, relying on application-based encryption of data before it’s written to the object store. LDAP and AD authentication support of users accessing the object store are common in contemporary object storage systems. Support of AWS .2 or .4 authentication to pro.ide access to .aults—and objects within .aults—is less common and should be an e.aluation criterion when selecting an object storage system.

professional ser.ices to deploy and keep object storage systems humming, the open-source OpenStack Swift product demands a higher degree of self-reliance. For companies that don’t ha.e the internal resources to deploy and manage OpenStack Swift, SwiftStack sells an enterprise offering of Swift with cluster and management tools, enterprise integration and 24-7 support.

TAKE AWAY OBJECT STORAGE MINIMIZES MANAGEMENT

Object storage systems are designed to minimize human storage administration through automation, policy engines and self-correcting capabilities. “The Cle.ersafe system enables storage administrators to handle 15 times the storage capacity of traditional storage systems,” claims Kennedy. Object storage systems are designed for zero downtime, and all administration tasks can be performed without ser.ice disruption—from upgrades, hardware maintenance and refreshes to adding capacity and changing data centers. Policy engines enable the automation of object storage beha.ior, such as when to use replication .s. erasure coding, under what circumstances to change the number of replicas to support usage spikes, and what data centers to store objects in based on associated metadata. While commercial object storage products typically pro.ide management tools, technical support and

Without question, object storage systems are on the rise. Their ability to scale and accessibility .ia APIs makes them suitable in use cases where traditional storage systems simply can’t compete. They’re also increasingly becoming a NAS alternati.e, with some object storage .endors claiming parity with NAS systems. With a growing list of object storage products, choosing an object storage system becomes increasingly challenging, howe.er. O.erall features and ensuring that your use cases are supported, cost and .endor .iability are primary decision criteria when in.estigating object storage systems. Still a relati.ely new technology with capabilities .arying and in flux, reference checks and (if possible) performing proof-of-concept testing are highly ad.isable before finalizing your object storage product selection. n is a freelance writer and corporate VP of IT Business Solutions. He can be reached at [email protected].

JACOB N. GSOEDL

STORAGE • MAY 2016

14

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases

Snapshot 1 Need for more capacity driving most NAS purchases

D Top drivers for new NAS purchase*

new NAS 73% 30%

Cloud alternatives to local backup

25%

Snapshot 2: NFS and 10 Gig preferred for new NAS

20%

Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud

D Most critical features needed with

14% 10%

Need more capacity

48%

File system size

37%

Clustering support

32%

Performance and capacity scale separately

22%

Support for both NFS and CIFS (SMB 3.0)

10%

Support for 10Gig Ethernet

Improve performance

Replacing existing NAS

New storage for new/specific app

Use as backup target

Better support virtual servers *UP TO TWO RESPONSES PERMITTED; SOURCE: TECHTARGET RESEARCH

8%

Use as archive repository

5%

Adding a new storage tier

156 D The average capacity of planned

About us

3%

For branch office(s)

*UP TO THREE RESPONSES PERMITTED; SOURCE: TECHTARGET RESEARCH

new NAS system, in terabytes

STORAGE • MAY 2016

15

CLOUD BACKUP

lifeblood of most businesses today. And yet the job of backing up that data is probably one of the least lo.ed but most important processes in IT. Few organizations could sur.i.e without the email and producti.ity tools they use e.ery day, not to mention the data (both current and archi.ed) these applications generate. And, at the other end of the spectrum, entire business sectors, such as finance, couldn’t operate without huge IT infrastructures and the .olumes of data they contain. This makes it essential to implement a data protection plan, that includes putting into place a reliable process for backing up that data. The public cloud and, in particular, cloud storage pro.ide organizations with a huge opportunity to implement scalable, manageable and dependable backups. These cloud backup options—such as Amazon Web Ser.ices (AWS), Microsoft Azure and Google Cloud Platform—effecti.ely offer unlimited storage capacity at the end of a network, with no need to understand how the supporting infrastructure is constructed, managed or upgraded. Public cloud .endors ha.e also introduced multiple tiers of storage into their products to stay competiti.e. AWS, for example, offers three le.els of storage (Standard, Infrequent Access and Glacier) each of which deli.ers different ser.ice le.els and price points. Google’s public cloud mirrors AWS offerings with its Standard, Nearline

DATA IS THE

Tap the cloud to ease backup burdens Implement scalable and dependable cloud backups for data, applications and virtual machines. BY CHRIS EVANS

ISTOCK

HOME STORAGE • MAY 2016

16

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

and Durable Reduced A.ailability storage tiers. There’s plenty of raw infrastructure a.ailable to store your backup data. The question to ask now is what data should be stored in the cloud and what cloud backup options do you use to back it up?

WHERE APPLICATIONS RUN MATTERS

To determine what data to store in the cloud and how to back it up, we need to first see how IT deploys applications. Nowadays, businesses can run applications from four main areas: This happens when running applications within a pri.ate data center managed by local IT teams. Systems are built on internal infrastructure and historically ha.e been backed up using similar infrastructure within the data center, replicating data to another location or taking backups off-site with remo.able media.

1. On premises (including private cloud).

2. Co-located. Rather

than sit in a customer’s data center, physical rack space is rented at a co-location facility that manages the en.ironmental aspects of the data center, while the customer continues to own the ser.er hardware. Co-location pro.ides an opportunity for third-party businesses to offer ser.ices like backup that are deployed in the same co-location facility. This offloads the work of backup but deli.ers low latency and high-throughput connecti.ity to backup infrastructure because of its physical proximity.

3. Public cloud. The public cloud can be used to deploy .ir-

tual ser.ers and applications without businesses owning or managing any of the underlying hardware. Infrastructure as a ser.ice (IaaS) .endors won’t pro.ide backup capabilities outside of the requirement to return failing systems back to normal operation, howe.er. So if a ser.er crashes or data is lost, the IaaS .endor will simply return the system to the pre.ious state of operation. 4. Public cloud. Platform

as a ser.ice (PaaS) and software as a ser.ice (SaaS) ha.e been widely adopted for the most easily packaged and transferrable ser.ices, such as email (e.g., Office 365), and applications, such as CRM (e.g., Salesforce). PaaS and SaaS offerings operate in a similar way to IaaS, in that the platform pro.ider ensures systems are always up and running with the latest .ersion of applications and data. They won’t directly pro.ide the ability to reco.er historical data (e.g., when a user inad.ertently deletes .ital account records), howe.er.

BACKUP OPTIONS FOR THE PUBLIC CLOUD

Organizations ha.e a number of choices among cloud backup options that take ad.antage of public cloud storage, including: n

Back up directly to the public cloud. Write data directly to

AWS Simple Storage Ser.ice (S3), Azure, Google or one of many other cloud infrastructure pro.iders. n

Back-up-to-a-service provider.

Write data to a ser.ice STORAGE • MAY 2016

17

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

pro.ider offering backup ser.ices in a managed data center.

A number of storage .endors ha.e also started to support nati.e S3 backups from within their storage platforms. SolidFire introduced the ability to archi.e snapn Implement disaster recovery as a service (DRaaS). A shots to S3 or other SWIFT-compatible object stores as number of .endors offer DR ser.ices that manage the part of the release of its Element OS Version 6 in March backup and restore process directly, focusing on the 2014. Zadara Storage, which offers a Virtual Pri.ate Storapplication/.irtual machine rather than age Array (VPSA) either on customer just data. These DRaaS offerings also premises or deployed at a co-location site, work with PaaS/SaaS applications to pro.ides S3 support to archi.e snapshots secure data already stored in the public that can either be restored to Amazon’s cloud. Elastic Block Store (EBS) ser.ice or any Cloud backup is no other .endor’s storage hardware. longer simply about Existing backup software pro.iders One word of caution when deciding to shipping data to cheap ha.e extended their products to take use public cloud storage: Data written to storage locations. Toad.antage of cloud storage as a nati.e S3 and other ser.ices won’t be dedupliday, entire applications backup target. Veritas (formerly part of cated by the cloud pro.ider to reduce the can be migrated to, run from and backed up to Symantec) updated NetBackup to .eramount of space consumed by the user and within public cloud sion 7.7.1 toward the end of 2015, extend(although they may deduplicate behind infrastructure. ing AWS S3 support to co.er the Standard the scenes). This means data must be - Infrequent Access (IA) tier. (Version 7.7 deduplicated before being written to the originally introduced a cloud connector cloud if that feature is not built into a feature with the ability to write directly to S3.) backup product. The Comm.ault Data platform (formally called SimOne option to o.ercome this issue is to use software pana) nati.ely supports all of the major public cloud such as that from StorReduce. Its cloud-based .irtual pro.iders and a range of object store .endors—including appliance deduplicates S3 data, storing only the unique Caringo and DataDirect Networks. It also supports an data on the customer’s S3 account. (You can write to extended set of .endors through standardization on the StorReduce as the target in real time and it will write to S3 protocol, highlighting how S3 as a standard is being S3 in real time.) This significantly reduces the amount of used to pro.ide interoperability between object stores and data stored on S3, which translates to cost sa.ings, both backup platforms, e.en if those systems are not running in data stored and the transfer costs for reading and writin the public cloud. ing to S3 itself. STORAGE • MAY 2016

18

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

OPENING THE DOOR TO MSP AND SAAS BACKUPS

Products such as Spanning (acquired by EMC in 2014) and Backupify (acquired by Datto the same year) enable organizations to back up SaaS data. Pricing is typically calculated on a per-user-per-month basis, which has to be added into the o.erall cost of using a SaaS.

Managed ser.ice pro.iders (MSP) offer backup ser.ices that take ad.antage of co-location facilities to offer cloud backup options. If IT is already using hosting ser.ices from companies such as Equinix, then backups can be performed within the data center across the high-speed network implemented by the hosting WHAT TO BACK UP? company, rather than going out onto the THAT IS THE QUESTION public Internet. An important consideration when examA number of software .endors, includining cloud backup options is deciding ing Asigra and Zerto, deli.er .ersions what exactly to back up. It is possible of their products specifically designed The S3 API provides a to back up only application data or an so that MSPs can deli.er a white-label common standard that allows backup applicaentire .irtual machine, for example. The backup platform to their customers. The tions to write data to ad.antage of a VM backup is that it makes benefit of using a ser.ice pro.ider for both object storage and it possible to restart an application in backup is in the security of keeping data public cloud providers the cloud in the e.ent of a disaster at the within the MSP’s facilities. That way, primary site. This also means IT doesn’t data doesn’t ha.e to tra.erse the public need to ha.e specific DR hardware Internet, which may resol.e issues of and can instead operate applications from within the compliance for some organizations. MSPs can also deli.er cloud. “.alue-added” ser.ices that let customers run applications Datto is an example of a .endor that pro.ides customers in DR mode if primary systems fail. with the ability to run applications in DR mode in a cloud. SaaS, meanwhile, has allowed many IT shops to outIt offers a number of appliances that back up VMs locally, source common applications to the public cloud—most replicating them to the pri.ate cloud Datto purpose-built notably email, customer relationship management and to allow customers to failo.er their applications in the collaboration tools. While SaaS remo.es the need to e.ent of a disaster. manage infrastructure and applications, it doesn’t fully Dru.a pro.ides a similar ser.ice with Phoenix DRaaS, pro.ide data management capabilities. A SaaS pro.ider where entire applications can be backed up to the cloud will, for example, reco.er data from hardware or applica(through the replication of VM snapshots) and restarted tion failure but not from common user errors such as the within AWS. The Dru.a application manages issues like accidental deletion of files or emails. STORAGE • MAY 2016

19

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple

IP address changes that need to be put in place as the application is mo.ed to run in a different network.

Traditional .s. appliance-based backups are important to consider because the public cloud is increasingly becoming a practical target for data backups. The effecti.e, limitless scale of the cloud takes away many of the CLOUD BACKUPS: TRADITIONAL VS. APPLIANCE operational headaches associated with managing backup Traditional backup software applications ha.e been infrastructure. modified to write directly to the cloud, Ob.iously, there is a tradeoff between typically using standard protocols like running backup locally and using cloud Amazon’s S3 API. In this instance, the as the target, particularly in managing application needs to perform any data network bandwidth. Howe.er, with the reduction tasks like deduplication before ability to mo.e entire .irtual machines pushing the data out, as stored data is into the cloud and run them there in Public cloud takes charged per terabyte. disaster reco.ery mode, we could see a away the need for many IT shops to build By comparison, application gateways serious decline in the use of traditional and manage their own can be used to cache data as it is being backup applications as IT realizes it no DR site written to the cloud storage. The applilonger needs to build out dedicated diance can then perform deduplication saster reco.ery facilities or suffer the and also cache data locally, allowing for impractical nature of shipping physical quicker restores from backup where needed. Typically, media off-site. n the majority of restores occur within the first few days of CHRIS EVANS is an independent consultant with Langton Blue. a backup being taken.

Matchett: Say hi to the transformational cloud About us

STORAGE • MAY 2016

20

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future

Snapshot 2 Users favor venerable NFS and 10 GigE for new NAS systems

D Top five apps to be deployed on new NAS systems

D NFS is still the preferred protocol for new NAS

Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases

59%

Database applications

39%

Web and application serving

33%

Support for virtual servers

Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS

79% NFS

24%

Unstructured data (e.g., user shares)

14%

Virtual desktop infrastructure

Hadoop’s role in big data storage Sinclair: Simple storage not so simple

43% CIFS

*MULTIPLE SELECTIONS ALLOWED; SOURCE: TECHTARGET RESEARCH

Matchett: Say hi to the transformational cloud About us

SOURCE: TECHTARGET RESEARCH

21% SMB 3.0

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

n n n n

69% plan to use 10 Gbps Ethernet to hook up their new NAS STORAGE • MAY 2016

21

HADOOP

storage discussions to begin with a reference to data growth. The implied assumption is that companies will want to capture and store all the data they can for a growing list of analytics applications. Today, because the default policy for retaining stored data within many enterprises is “sa.e e.erything fore.er,” many organizations are regularly accumulating multiple petabytes of data. Despite what you might think about the commoditization of storage, there is a cost to storing all of this data. So why do it? Because executi.es today realize that data has intrinsic .alue due to ad.ances in data analytics. In fact, that data can be monetized. There’s also an understanding at the executi.e le.el that the .alue of owning data is increasing while the .alue of owning IT infrastructure is decreasing. Hadoop Distributed File System (HDFS) is fast becoming the go-to tool enterprise storage users are adopting to tackle the big data problem, and here’s a closer look as to how it became the primary option. IT'S COMMON FOR

Big data requires big storage Hadoop deployments evolve thanks to enterprise storage vendors and the Apache community. BY JOHN WEBSTER

TOTALLYPIC/FOTOLIA

WHERE TO PUT ALL THAT DATA?

Traditional enterprise storage platforms—disk arrays and tape siloes—aren’t up to the task of storing all of the data. Data center arrays are too costly for the data .olumes

HOME STORAGE • MAY 2016

22

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

en.isioned, and tape, while appropriate for large .olumes at low cost, elongates data retrie.al. The repository sought by enterprises today is often called the big data lake, and the most common instantiation of these repositories is Hadoop. Originated at the Internet data centers of Google and Yahoo, Hadoop was designed to deli.er high-analytic performance coupled with large-scale storage at low cost. There is a chasm between large Internet data centers and enterprise data centers that’s defined by differences in management style, spending priorities, compliance and risk-a.oidance profiles, howe.er. As a result, the Hadoop Distributed File System was not originally designed for long-term data persistence. The assumption was data would be loaded into a distributed cluster for MapReduce batch processing jobs and then unloaded—a process that would be repeated for successi.e jobs. Nowadays, enterprises not only want to run successi.e MapReduce jobs, they want to build multiple applications that, for example, con.erge analytics with the data generated by online transaction processing (OLTP) on top of the Hadoop Distributed File System. Common storage for multiple types of analytics users is needed as well (see Figure 1, Hadoop’s multiple application en.ironment supported by YARN and the Hadoop Distributed File System). Some of the more popular applications include Apache HBase for online transaction processing, and Apache Spark and Storm for data streaming as well as real-time analytics. To do this, data needs to be persisted, protected and secured for multiple user groups and for long periods of time.

FILLING THE HADOOP STORAGE GAP

Current .ersions of Hadoop Distributed File System ha.e storage management features and functions consistent with persisting data, and the Apache open-source community works continuously to impro.e HDFS to make it more compatible with enterprise production data centers. Some important features are still missing, howe.er. So the challenge for administrators is to determine whether or not the HDFS storage layer can in fact ser.e as an acceptable data-preser.ation foundation for the Hadoop analytics platform and its growing list of applications and users. With multiple Apache community projects taking the attention of de.elopers, users are often kept waiting for production-ready Hadoop storage functionality in future releases. The current list of Hadoop storage gaps to be closed includes: n

Inefficient and inadequate data protection and DR

HDFS relies on the creation of replicated data copies (usually three) at ingest to reco.er from disk failures, data loss scenarios, loss of connecti.ity and related outages. While this process does allow a cluster to tolerate disk failure and replacement without an outage, it still doesn’t totally co.er data loss scenarios that include data corruption. In a recent study, researchers at North Carolina State Uni.ersity found that while Hadoop pro.ides fault tolerance, “data corruptions still seriously affect the integrity, performance, and a.ailability of Hadoop systems.” This process also makes for .ery inefficient use of storage media—a critical concern when users wish to retain data capabilities.

STORAGE • MAY 2016

23

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future

in the Hadoop cluster for up to se.en years, as may be required for regulatory compliance reasons. The Apache Hadoop de.elopment community is looking at implementing erasure coding as a second “tier” for low-frequency-of- access data in a new .ersion of Hadoop Distributed File System later this year.

HDFS also cannot replicate data synchronously between Hadoop clusters, a problem because synchronous replication is critical for supporting production-le.el DR operations. And while asynchronous replication is supported, it’s open to the creation of file inconsistencies across local/remote cluster replicas o.er time.

Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage

Example of Hadoop Analytics Environment Structure Hadoop’s multiple application environment (MapReduce, SQL/NoSQL and in-memory analytics, for example) supported by YARN (Yet Another Resource Negotiator) as a platform OS and HDFS as persistent storage for all applications running above the HDFS storage layer.

Batch

SQL

Online

InMemory

MapReduce

Hive

HBase Accumulo

Spark

Others

Sinclair: Simple storage not so simple

Multitenant Processing: YARN (Hadoop Operating System)

Matchett: Say hi to the transformational cloud About us

Storage: HDFS (Hadoop Distributed File System)

SOURCE: HORTONWORKS AND EVALUATOR GROUP

STORAGE • MAY 2016

24

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

n

Inability to disaggregate storage from compute re-

HDFS binds compute and storage together to minimize the distance between processing and data for performance at scale, resulting in some unintended consequences for when HDFS is used as a long-term persistent storage en.ironment. To add storage capacity in the form of data nodes, an administrator has to add processing resources as well, needed or not. And remember that 1 TB of usable storage equates to 3 TB after copies are made. Data in/out processes can take longer than actual query process. One of the major ad.antages of using Hadoop for analytics applications .s. traditional data warehouses lies in its ability to run queries against .ery large .olumes of unstructured data. This is often accomplished by copying data from acti.e data stores to the big data lake, a process that can be time-consuming and network resource-intensi.e, depending on the amount of data. Perhaps more critically from the standpoint of Hadoop in production, this can lead to data inconsistencies, causing application users to question whether or not they are querying a single source of the truth. sources.

ALTERNATIVE HADOOP ADD-ONS AND STORAGE SYSTEMS

The Apache community often creates add-on projects to address Hadoop deficiencies. Administrators can use the Raft distributed consensus protocol to reco.er from cluster failures without recomputation, and the DistCp (distributed copy) tool for periodic synchronization of clusters across WAN distances. Falcon, a feed process-

ing and management system, addresses data lifecycle and management, and the Ranger framework centralizes security administration. These add-ons ha.e to be installed, learned and managed as separate entities, and each has its own lifecycle, requiring tracking and updating. To address these issues, a growing list of administrators ha.e begun to integrate data-center-grade storage systems with Hadoop—ones that come with the required data protection, integrity, security and go.ernance features built-in. The list of “Hadoop-ready” storage systems includes EMC Isilon and EMC Elastic Cloud Storage (ECS), Hitachi’s Hyper Scale-Out Platform, IBM Spectrum Scale and NetApp’s Open Solution for Hadoop. Let’s look at two of these external Hadoop storage systems in more detail to understand the potential .alue of this alternate route.

EMC ELASTIC CLOUD STORAGE

ECS is a.ailable as a preconfigured hardware/software appliance, or as software that can be loaded onto scaleout racks of commodity ser.ers. It supports object storage ser.ices as well as HDFS and NFS .3 file ser.ices. Object access is supported .ia Amazon Simple Storage Ser.ice (S3), Swift, OpenStack Keystone V3 and EMC Atmos interfaces. ECS uses Hadoop as a protocol rather than a file system and requires the installation of code at the Hadoop cluster le.el, and the ECS data ser.ice presents Hadoop cluster nodes with Hadoop-compatible file system access to its unstructured data. It supports both solid-state and hybrid STORAGE • MAY 2016

25

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple

hard dri.e storage embedded into ECS nodes, and scales up to 3.8 PB in a single rack depending on user configuration. Data and storage management functions include snapshot, journaling and .ersioning, and ECS implements erasure coding for data protection. All ECS data is erasure coded except the index and metadata where ECS maintains three copies of the data. Additional features of .alue in the context of enterprise production-le.el Hadoop include: n

file writes are aggregated and written as one operation while parallel node processing is applied to large file access.

ECS allows for immediate data access from any ECS site in a multisite cluster supported by strong consistency (applications presented with the latest .ersion of data, regardless of location and indexes across all locations synchronized). ECS also supports primary, remote and secondary sites across a single cluster, as well as asynchronous replication. n

About us

Multisite access and three-site support.

ECS allows administrators to implement time-based data-retention policies. It supports compliance standards such as SEC Rule 17a-4. EMC Centera CE+ lockdown and pri.ileged delete also supported.

n

Matchett: Say hi to the transformational cloud

Consistent write performance for small and large file

sizes. Small

Regulatory compliance.

Searches can be performed across user-defined and system-le.el metadata. Indexed searching on key .alue pairs is enabled with a user-written interface. n

Search.

Inline data-at-rest encryption with automated key management where keys generated by ECS are maintained within the system.

n

Encryption.

IBM SPECTRUM SCALE

IBM Spectrum Scale is a scalable (to multi-PB range), high-performance storage system that can be nati.ely integrated with Hadoop (no cluster-le.el code required). It implements a unified storage en.ironment, which means support for both file and object-based data storage under a single global namespace. For data protection and security, Spectrum Scale offers snapshots at the file system or set le.el, and backup to an external storage target (backup appliance and/or tape). Storage-based security features include data-at-rest encryption and secure erase plus LDAP/AD for authentication. Synchronous and asynchronous data replication at LAN, MAN and WAN distances with transactional consistency is also a.ailable. Spectrum Scale supports automated storage tiering using flash for performance and multi-terabyte mechanical disk for inexpensi.e capacity with automated, policy-dri.en data mo.ement between storage tiers. Tape is a.ailable as an additional archi.al storage tier. Policy-dri.en data compression can be implemented on a per-file basis for an approximately 2x impro.ement in storage efficiency and reduced processing load on Hadoop cluster nodes. And, for mainframe users, Spectrum Scale can be integrated with IBM z Systems, which often play the role of remote data islands when it comes to Hadoop. STORAGE • MAY 2016

26

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases

A SPARK ON THE HORIZON

Apache Spark as a platform for big data analytics runs MapReduce applications faster than Hadoop, but also like Hadoop is a multi-application platform offering analysis of streaming data. Spark’s more efficient code base and in-memory processing architecture accelerate performance while still le.eraging commodity hardware and open-source code. Unlike Hadoop, Spark does not come with its own persistent data storage layer, howe.er, so the most common Spark implementations are on Hadoop clusters using HDFS.

The growth of Spark is the result of growing interest in stream processing and real-time analytics. Again, Hadoop Distributed File System wasn’t originally concei.ed to function as a persistent data store underpinning streaming analytics application. Spark will make storage performance tiering for Hadoop e.en more attracti.e, yet another reason to consider marrying Hadoop with enterprise storage. n JOHN WEBSTER

is a senior partner and analyst at the Evaluator

Group.

Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

STORAGE • MAY 2016

27

HOT SPOTS SCOTT SINCLAIR Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases

When simple storage isn’t so simple Storage simplicity is about the tangible benefits that efficiency delivers.

Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

inno.ation is the goal of many of today’s technology de.elopment firms. Enterprise storage systems are no different. While each new storage product seeks to deli.er substantial benefits, some are more tangible than others. One specific feature, so-called “simplicity,” commonly finds itself on the wrong end of the tangibility spectrum—too often deli.ering .ague or e.en immaterial benefits to customers. The use of the term “simple” in storage product marketing is so per.asi.e, its effecti.eness has been weakened. In fact, I’.e yet to see an enterprise storage system (traditional storage arrays, mission-critical tier-one storage monoliths, or e.en open-source storage software) that isn’t marketed as “simple.” For a simplicity claim to be truly effecti.e, a product needs to deli.er a measurable financial

DELIVERING VALUE THROUGH

benefit to businesses. Translating simplicity claims into business impacts is often a challenge, howe.er. So I am proposing a different way of thinking about simplicity.

TWO SIMPLE WAYS TO THINK OF SIMPLICITY 1. Look past the user interface: Most IT systems should ha.e at least some form of graphical user interface, and that interface should endea.or to make it easy to digest all necessary information while reducing the number of steps required to use the product effecti.ely. For the .ast majority of enterprise storage systems, graphical interface design has hit the law of diminishing returns, though. In addition, the definition of a simple interface is relati.e to the user. Reducing the number of steps from fi.e to two, for example, should impro.e the ease of use, but not if the user is already adept at those fi.e steps. And, for IT organizations that use scripting, these simplicity inno.ations can pro.ide little to no benefit in e.eryday use. For these en.ironments, consistency is usually more critical.

Without impro.ing efficiency, notions of simplicity are meaningless. Reducing the number of storage elements to be managed, deployed or supported deli.ers far more tangible and impactful benefits to the bottom line than simply reducing the number of steps in an interface. Greater efficiency offers

2. Efficiency is the new simplicity:

STORAGE • MAY 2016

28

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

ob.ious benefits to capital costs, as less equipment translates directly to reduced capital expenses. Here, I focus on operational expenditures, which can often be reduced in one of two ways. The first method is to reduce the number of storage elements to manage (e.g., managing one large data pool instead of dozens of small ones). The second method reduces the number of physical components or systems that need to be deployed and supported, such as deploying a single all-flash array to replace the performance of dozens of shel.es of spinning dri.e media. Fortunately, the storage industry is rife with efficiency-augmenting inno.ations. The following are se.eral examples of newer storage technologies that deli.er tangible simplicity benefits through more efficient designs. Hyper-convergence: The ability to consolidate multiple ser.ers, switches and enterprise storage systems into only a few hyper-con.erged platforms reduces not only the number of components IT is required to manage, it decreases the number of physical systems as well. In this way, hyper-con.erged .endors, such as Atlantis HyperScale, Nutanix and SimpliVity, simplify IT infrastructure to deli.er measurable sa.ings.

infrastructure deployment and design. The net result is a reduction in the cost of operations. In addition to this performance density ad.antage, multiple flash storage .endors offer additional efficiency capabilities. All-flash enterprise storage systems, such as EMC’s XtremIO, NetApp’s SolidFire and now Nimble’s

FOR THE VAST MAJORITY OF ENTERPRISE STORAGE SYSTEMS, GRAPHICAL INTERFACE DESIGN HAS HIT THE LAW OF DIMINISHING RETURNS.

n

Adapti.e Flash, pro.ide scale-out architectures that reduce the number of management elements. While not scale-out, Pure Storage, meanwhile, offers a modular array that allows IT to expand performance across hardware generations, reducing the demand for incremental deployments. n Scale-out

For transactional workloads, achie.ing high performance with spinning media requires large quantities of spindles, increasing the number of storage elements required to manage, deploy and protect. The performance density of solid-state greatly reduces the amount of equipment required, and therefore simplifies n Solid-state storage:

file or object storage: For high-capacity work-

loads, scale-out file and object storage systems deli.er a single, massi.ely scalable pool of storage, reducing the number of storage elements to manage. Products from .endors such as Caringo, IBM/Cle.ersafe, Cloudian, EMC, HGST, Qumulo and Scality can significantly reduce the number of file systems that organizations manage. In STORAGE • MAY 2016

29

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS

addition, some of these products offer automatic multisite resiliency, eliminating the need to manage dozens of replication processes. Software-defined storage (SDS): Abstracting storage functionality from the underlying hardware can pro.ide greater deployment flexibility and management consolidation for multiple heterogeneous storage elements. Either of these features can allow for a more efficient infrastructure design to impro.e simplicity and reduce operational costs. Some SDS offerings enable organizations to further reduce the amount of storage infrastructure required. DataCore with its Parallel I/O technology, for example, claims to effecti.ely le.erage parallel processing to deli.er greater performance from existing components, further extending gains from efficiency.

n

TWO SIMPLE QUESTIONS WORTH ASKING

These are just a few examples of storage inno.ations that are reducing operational costs by deli.ering greater efficiency and simplicity. Ultimately, when a storage .endor says its product is simple, look past the interface and ask two questions: 1. Does the technology reduce the number of storage elements you ha.e to manage? If yes, then it will help reduce Opex. 2. Does it reduce the number of physical storage components needed to be deployed and supported? If yes, sa.ings will likely be e.en greater. n SCOTT SINCLAIR is a storage analyst with Enterprise Strategy Group in Austin, Texas.

Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

STORAGE • MAY 2016

30

READ / WRITE MIKE MATCHETT Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases

The sun rises on transformational cloud storage Storage managers can seize the day and help their companies take advantage of the cloud.

Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

hearing about the ine.itable transition to the cloud for IT infrastructure since before the turn of the century. But, year after year, storage shops quickly become focused on only that year’s prioritized initiati.es, which tend to be mostly about keeping the lights on and costs low. A true .ision-led shift to cloud-based storage ser.ices requires explicit executi.e sponsorship from the business side of an organization. But unless you cynically count the creeping use of shadow IT as an actual strategic directi.e to do better as an internal ser.ice pro.ider, what gets asked of you is likely—and unfortunately—to perform only low-risk tactical deployments or incremental upgrades. Not exactly the stuff of business transformations. WE HAVE BEEN

Cloud adoption at a le.el for maximum business impact requires big executi.e commitment. That amount of commitment is, quite frankly, not easy to generate.

THE TRANSFORMATIONAL CLOUD

We all know cloud opportunities exist beyond a bit of cold archi.e data storage here and there. These ser.ices not only sa.e real money, but can also significantly realign IT effort from just running infrastructure to increasing business .alue. Yet most IT shops run too lean and mean, and lack the skills, willpower or time to go off and actually risk transitioning core workloads (or otherwise take ad.antage of) to hybrid or public cloud-based storage ser.ices. Instead, the pre.alent attitude is to passi.ely accept that some hybrid cloud usage is ine.itably going to creep in so organizations can check the cloud “box” on their internal score cards. Perhaps it’s a storage as a system (SaaS) integration with on-premises apps, or maybe more cloud storage as a target for backups and archi.e data emanating from some future storage update project, and so on. It is time for e.eryone to make bolder cloud mo.es. Current market offerings are low-risk, easy to adopt and pro.ide enough payback to help justify a larger transition and commitment to cloud-based storage ser.ices for e.en the most traditional organization. The key to success for both IT and .endors is to find that one cloud proof-point STORAGE • MAY 2016

31

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

that is .isible enough to business stakeholders to moti.ate and accelerate a larger cloud strategy. Surprisingly, business-impacting cloud transformation is where storage folks can demonstrate significant leadership. Instead of storage lagging fi.e years behind other domains (well, that’s the impression, isn’t it?), it can lead the way. To be clear, I’m talking about le.eraging cloud for more than just cold backup/archi.e tiering, special big data projects or one-off Web 2.0 use cases. While the cloud can certainly pro.ide all that, let’s look at some examples of cloud-based storage ser.ices that hit directly at the heart of daily business operations.

data at the edge for top performance. Another example of a business-impactful cloud-based storage ser.ice is Ri.erbed’s SteelFusion, which offers a related but different cloud storage approach than CTERA and ClearSky Data for remote and back offices. By “projecting” data center storage out to remote locations using world-class WAN optimization built into “edge hyper-con.erged” appliances, SteelFusion effecti.ely turns any enterprise data center storage array into a pri.ate cloudlike storage host supporting highly performant remote (localized) processing.

THE CLOUD AND TRADITIONAL STORAGE MAXIMUM IMPACT CLOUD SERVICES

First, let’s acknowledge that most businesses today depend on file sync-and-share ser.ices. The more functional and frictionless these products are to use, the more they get used. It’s easy enough to go the SaaS route if that meets all your needs and fits your budget structure, but I would point out there are ultimately more affordable and easyto-deploy file sync-and-share ser.ices a.ailable. Take CTERA 5.0, for example. With CTERA organizations can not only check the box for full IT-go.erned file sync-and-share, they also get a host of other .irtual pri.ate cloud capabilities for today’s increasingly distributed and mobile businesses. If tier 1 performance is your main concern with cloud storage, check ClearSky Data, which built a “metro area” cloud deli.ering sub-millisecond latency. Both CTERA and ClearSky Data’s ser.ices le.erage the cloud for capacity and distribution, but ser.e important

As an industry, we are starting to expect cloud tiering as something arrays should nati.ely support. Old-school .endors, meanwhile, are concerned about protecting capacity-based legacy storage re.enue streams—although there is some change occurring there, too. IBM and EMC, for instance, are each working to iron out the kinks in how their traditional storage di.isions work with their respecti.e clouds. And, as another example, Microsoft Azure StorSimple—what we used to think of as a simple storage appliance—has e.ol.ed into something much more. With automatic backup and cloud tiering (to Microsoft Azure) and a new .irtual StorSimple option that can also run in Azure for in-cloud reco.ery, small IT storage projects that at first only seem to be about cheaper and better protected local shared storage, quickly help justify and accelerate larger and more impactful business-cloud transformations. STORAGE • MAY 2016

32

Home Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future

And lest you think cloud-based storage ser.ices are only for mobile users and midrange storage, Oracle’s cloud can now ser.ice mainframe storage data, too. That’s because with Oracle’s recent StorageTek Virtual Storage Manager System 7 release, mainframe managers can now do away with tape and use Oracle Cloud as an enterprise mainframe storage tier.

Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup

LIGHTNING STRIKES FOR CLOUD STORAGE SERVICES

The common benefit of all these cloud-based storage ser.ices is that while they may be initiated to sol.e practical IT issues, once in place, they pro.e the wider .alue of the cloud and excite business executi.es to champion further

cloud transition. While IT folks may ha.e to adapt to operating cloud-style, the cloud enables IT to focus more on addressing business-le.el technology needs rather than just supporting infrastructure. If your company has talked about the cloud, but not really gotten any momentum around major cloud business transition efforts, now is a good time to see if a simple storage refresh project can be used to stimulate and pa.e the way to the cloud. Bottom line: If you mo.e the data, the business mo.es, too. Cloud success fundamentally transforms IT and redefines the relationship between IT and business stakeholders. n MIKE MATCHETT

is a senior analyst and consultant at Taneja Group.

Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud About us

STORAGE • MAY 2016

33

Home

TechTarget Storage Media Group

Castagna: Big storage cooking up in labs Toigo: Hyperconsolidation the future Object storage squeezing out NAS Snapshot 1: Capacity driving most NAS purchases Cloud alternatives to local backup Snapshot 2: NFS and 10 Gig preferred for new NAS Hadoop’s role in big data storage Sinclair: Simple storage not so simple Matchett: Say hi to the transformational cloud

STORAGE MAGAZINE VP EDITORIAL Rich Castagna EXECUTIVE EDITOR James Alan Miller SENIOR MANAGING EDITOR Ed Hannan CONTRIBUTING EDITORS James Damoulakis, Steve Duplessie, Jacob Gsoedl DIRECTOR OF ONLINE DESIGN Linda Koury SEARCHSTORAGE.COM SEARCHCLOUDSTORAGE.COM SEARCHVIRTUALSTORAGE.COM SENIOR NEWS DIRECTOR Dave Raffo SENIOR NEWS WRITER Sonia R. Lelii SENIOR WRITER Carol Sliwa STAFF WRITER Garry Kranz SITE EDITOR Sarah Wilson ASSISTANT SITE EDITOR Erin Sullivan SEARCHDATABACKUP.COM SEARCHDISASTERRECOVERY.COM SEARCHSMBSTORAGE.COM SEARCHSOLIDSTATESTORAGE.COM EXECUTIVE EDITOR James Alan Miller SENIOR MANAGING EDITOR Ed Hannan STAFF WRITER Garry Kranz SITE EDITOR Paul Crocetti

STORAGE DECISIONS TECHTARGET CONFERENCES EDITORIAL EXPERT COMMUNITY COORDINATOR Kaitlin Herbert

SUBSCRIPTIONS www.SearchStorage.com

STORAGE MAGAZINE 275 Grove Street, Newton, MA 02466 [email protected]

TECHTARGET INC. 275 Grove Street, Newton, MA 02466 www.techtarget.com

©2016 TechTarget Inc. No part of this publication may be transmitted or reproduced in any form or by any means without written permission from the publisher. TechTarget reprints are available through The YGS Group. About TechTarget: TechTarget publishes media for information technology professionals. More than 100 focused websites enable quick access to a deep store of news, advice and analysis about the technologies, products and processes crucial to your job. Our live and virtual events give you direct access to independent expert commentary and advice. At IT Knowledge Exchange, our social community, you can get advice and share solutions with peers and experts.

COVER IMAGE AND PAGE 9: VASABII/FOTOLIA

About us

Stay connected! Follow @SearchStorageTT today.

STORAGE • MAY 2016

34