Breaking Big - Bitpipe

1 downloads 151 Views 810KB Size Report
Next we report on the relationship between big data and in-memory analytics tools—and issues to consider before joinin
Editor’s Note

When to Use Hadoop, and When Not To

In-Memory Finds a Place in Big Data’s Universe

Training, Planning Needed to Put Hadoop Into Play

SEPTEMBER 2013

Business Information INSIGHT ON MANAGING AND USING DATA

SPECIAL ISSUE

Breaking Big Besieged by endless big data plugging and knee-deep in Hadoop hoopla, many businesses are confused—and it’s no wonder. To make the right technology decisions and tap into real value, a keen-eyed look is needed.

EDITOR’S NOTE | CRAIG STEDMAN

HOME

Beyond the Idle Talk on Big Data

EDITOR’S NOTE WHEN TO USE HADOOP, AND WHEN NOT TO IN-MEMORY FINDS A PLACE IN BIG DATA’S UNIVERSE TRAINING, PLANNING NEEDED TO PUT HADOOP INTO PLAY

well into the IT hype cycle—to the point where even some vendors and consultants looking to capitalize on big data deployments are getting tired of the term. At the 2013 gathering of the Pacific Northwest BI Summit, an annual meeting of about 20 vendor executives and technology consultants in Grants Pass, Ore., fun was had at the expense of the moniker big data during a session on the topic. For example, Shawn Rogers of consulting and research company Enterprise Management Associates said that the classic “three V’s” definition—volume, velocity and variety—has been beaten to death. “It just defines big data as an analyst hobby,” he said, to general approval. But there are real projects going on out there, in lots of companies. And by now, efforts to deploy big data technologies such as Hadoop and NoSQL databases may be putting your organization through the ringer. If so, it’s high time to rinse out some of the hype and look at big data management and analytics applications with a more considered eye. There is value to be had; at least, that’s the expectation. The Pacific Northwest BI Summit attendees weren’t down on the potential benefits of using big data—just on the term itself. And in a reader survey on business BIG DATA IS

2 

BUSINESS INFORMATION • SEPTEMBER 2013

intelligence, analytics and data warehousing topics, conducted earlier this year by TechTarget, which publishes Business Information magazine, interest levels in big data analytics were relatively high, and high-minded. Forty-one percent of 540 respondents said they had active programs or planned to add one in the next 12 months. And the primary goals of those respondents primarily revolved around driving new business: A combined 66% cited gaining competitive advantages, better understanding customers or increasing revenue. By comparison, 27% opted for improving organizational efficiency. The three articles in this special edition of Business Information offer insight and advice to help point the way forward. First we look at the capabilities, and limitations, of Hadoop. Next we report on the relationship between big data and in-memory analytics tools—and issues to consider before joining them at the hip. We close with tips on making Hadoop work in corporate applications from a panel of IT and BI professionals who spoke at the Hadoop Summit 2013. n is executive editor of TechTarget’s SearchData Management.com and SearchBusinessAnalytics.com websites. Email him at [email protected]. CRAIG STEDMAN

STRATEGIES | ED BURNS

WHEN TO USE HADOOP, AND WHEN NOT TO Hadoop has become everyone’s big data darling. For now, at least, it can only do so much—and savvy businesses shouldn’t buy into the hype.

In the past few years, Hadoop has

earned a lofty reputation as the go-to big data analytics engine. To many, it’s synonymous with big data technology. But the open source distributed processing framework isn’t the right answer to every big data problem, and companies looking to deploy it need to carefully evaluate when to use Hadoop—and when to look elsewhere. For example, Hadoop has ample power for processing large amounts of unstructured or semi-structured data. But it isn’t known for its speed in dealing with smaller data sets. That has limited its application at Metamarkets Group, a San Francisco provider of real-time marketing analytics services for online advertisers. Metamarkets CEO Michael Driscoll said the company uses Hadoop for large, distributed data processing tasks

HOME

3 

BUSINESS INFORMATION • SEPTEMBER 2013

STRATEGIES | ED BURNS

HOME EDITOR’S NOTE WHEN TO USE HADOOP, AND WHEN NOT TO IN-MEMORY FINDS A PLACE IN BIG DATA’S UNIVERSE TRAINING, PLANNING NEEDED TO PUT HADOOP INTO PLAY

where time isn’t a constraint. That includes running endof-the-day reports to review daily transactions or scanning historical data dating back several months. But when it comes to running the real-time analytics processes that are at the heart of what Metamarkets offers to its clients, Hadoop isn’t involved. Driscoll said that’s because it’s optimized to run batch jobs that look at every file in a database. It comes down to a tradeoff: In

IT COMES DOWN TO A TRADEOFF: IN ORDER TO MAKE DEEP CONNECTIONS BETWEEN DATA POINTS, HADOOP SACRIFICES SPEED. order to make deep connections between data points, the technology sacrifices speed. “Using Hadoop is like having a pen pal,” he said. “You write a letter and send it and get a response back. But it’s very different than [instant messaging] or email.” Because of the time factor, Hadoop has limited value in online environments where fast performance is crucial, said Kelly Stirman, director of product marketing at NoSQL database developer MongoDB Inc. For example, analytics-fueled online applications, such as product recommendation engines, rely on processing small amounts of information quickly. But Hadoop can’t do that efficiently, Stirman said. 4 

BUSINESS INFORMATION • SEPTEMBER 2013

No Replacement Plan Some businesses might be tempted to try scrapping their traditional data warehouses in favor of Hadoop clusters, because technology costs are so much lower with the open source technology. But Carl Olofson, an analyst at market research company IDC, said that weighing the two is an apples-and-oranges comparison. Olofson said the relational databases that power most data warehouses are used to accommodating trickles of data that come in at a steady rate over a period of time, such as transaction records from day-to-day business processes. Conversely, he added, Hadoop is best suited to processing vast stores of accumulated data. And because Hadoop is typically used in large-scale projects that require clusters of servers and employees with specialized programming and data management skills, implementations can become expensive, even though the cost-per-unit of data may be lower than with relational databases. “When you start adding up all the costs involved, it’s not as cheap as it seems,” Olofson said. Specialized development skills are needed because Hadoop uses the MapReduce software programming framework, which limited numbers of developers are familiar with. That can make it difficult to access data in Hadoop from SQL databases, according to Todd Goldman, vice president of enterprise data integration at software vendor Informatica Corp. Various vendors have developed connector software that can help move data between Hadoop systems and relational databases. But Goldman thinks that for many

STRATEGIES | ED BURNS

HOME EDITOR’S NOTE WHEN TO USE HADOOP, AND WHEN NOT TO IN-MEMORY FINDS A PLACE IN BIG DATA’S UNIVERSE TRAINING, PLANNING NEEDED TO PUT HADOOP INTO PLAY

organizations, too much work is needed to accommodate the open source technology. “It doesn’t make sense to revamp your entire corporate data structure just for Hadoop,” he said.

Helpful, Not Hype-Full One viable use that Goldman sees for Hadoop is as a staging area and data integration platform for running extract, transform and load (ETL) functions. That may not be as exciting an application as all the hype over Hadoop seems to warrant, but Goldman said it particularly makes sense when an IT department needs to merge large files. In such cases, the processing power of Hadoop can come in handy. Driscoll said Hadoop is good at handling ETL processes because it can split up the integration tasks among numerous servers in a cluster. He added that using Hadoop to integrate data and stage it for loading into a data warehouse or other database could help justify investments in the technology—getting its foot in the door for larger projects that take more advantage of Hadoop’s scalability. Of course, leading-edge Internet companies such as Google, Yahoo, Facebook and Amazon.com have been big Hadoop users for years. And new technologies aimed at eliminating some of Hadoop’s limitations are becoming

5 

BUSINESS INFORMATION • SEPTEMBER 2013

available. For example, several vendors have released tools designed to enable real-time analysis of Hadoop data. A Hadoop 2.0 release that is in the works will make MapReduce an optional element and enable Hadoop systems to run other types of applications. Ultimately, it’s important for IT and business executives to cut through all the hype and understand for themselves where Hadoop could fit in their operations.

“THERE’S SO MUCH HYPE AROUND [HADOOP] NOW THAT PEOPLE THINK IT DOES PRETTY MUCH ANYTHING.” —Kelly Stirman, product marketing director at MongoDB Inc. Stirman said there’s no doubt it’s a powerful tool that can support many useful analytical functions. But it’s still taking shape as a technology, he added. “There’s so much hype around it now that people think it does pretty much anything,” Stirman said. “The reality is that it’s a very complex piece of technology that is still raw and needs a lot of care and handling to make it do something worthwhile and valuable.” n

TECHNOLOGIES | BETH STACKPOLE

IN-MEMORY FINDS A PLACE IN BIG DATA’S UNIVERSE Big data plus memory-based analytics software can form a mutually beneficial relationship—if that’s the kind of power business users really need.

In-memory processing can serve as

a high-octane fuel for supercharging big data analytics applications. But organizations should weigh factors such as additional systems infrastructure costs and the readiness of their business processes before gassing up with in-memory analytics technology. Another key step in greasing the deployment skids is identifying big data analytics problems that have proven unsolvable or that could benefit from the performance boost typically provided by in-memory applications. “The integration of in-memory capabilities and big data boils down to use case and benefits,” said Paul Barth, co-founder of data management and analytics consultancy NewVantage Partners. “You need to consider the business value of accelerating time to answer— is it a matter of convenience, or is it a case when rapid turnaround and rapid analysis really benefits the decision-making process.” Detecting patterns in large stockpiles of data is one

HOME

6 

BUSINESS INFORMATION • SEPTEMBER 2013

TECHNOLOGIES | BETH STACKPOLE

HOME EDITOR’S NOTE WHEN TO USE HADOOP, AND WHEN NOT TO IN-MEMORY FINDS A PLACE IN BIG DATA’S UNIVERSE TRAINING, PLANNING NEEDED TO PUT HADOOP INTO PLAY

application where using in-memory analytics tools makes sense, Barth said, as are scenarios in which traditional business intelligence (BI) tools hit their limits on data volumes and processing speeds. Another example favoring in-memory technology: building an online recommendation engine that can be accelerated by running its business rules engine and analytics algorithms in memory.

A Data Flood At ContactLab, an email marketing services provider in Milan, Italy, the need for in-memory analytics capabilities became apparent when its business model shifted from broad-based marketing campaigns to a more individualized outreach approach, said Massimo Fubini, the company’s founder and director. ContactLab, which manages an average of 60,000 to 70,000 email and outbound SMS messages daily, faced a big data challenge as it tried to sort through hundreds of millions of data points on click-throughs, website visits and other actions to analyze customer behavior and serve up relevant marketing messages on the fly. Conventional BI tools worked fine up to that point, Fubini said. But the change in business strategy changed the analytics game and opened the door to the deployment of a Hadoop system that captures the data and feeds it into in-memory analytics software—in this case, SAS Visual Analytics from SAS Institute Inc. As part of the big data environment, ContactLab also 7 

BUSINESS INFORMATION • SEPTEMBER 2013

collects data from a variety of other sources, including mobile apps, social media sites, transactional systems and external marketing information services. The plethora of data makes it harder for marketing managers and other executives at the company’s clients to know what questions to ask. Fortunately, Fubini said the SAS tool’s combination of in-memory analytics and data visualization capabilities lets ContactLab’s analysts explore the data and come up with insights nearly instantaneously. “This world is really changing,” he said. “In the past, people knew what data was available and would ask for specific analytics. Now the amount of data we’re collecting is huge, and the requirements around analysis are much more interactive. You can’t give someone an answer in a day or two.”

Know Your People Knowing your user base is another gauge for determining if in-memory analytics tools are the right fit for a big data initiative. “It’s a bit of a judgment call, so you need to understand if your users can take advantage of the additional performance,” said William McKnight, president of McKnight Consulting Group. “If you have data scientists on staff, you don’t want them sitting there drilling and drilling into data only to get frustrated [by slow response times] and walk away. With super-fast performance, you can give them the advanced analytics capabilities they need.”

TECHNOLOGIES | BETH STACKPOLE

HOME EDITOR’S NOTE WHEN TO USE HADOOP, AND WHEN NOT TO IN-MEMORY FINDS A PLACE IN BIG DATA’S UNIVERSE TRAINING, PLANNING NEEDED TO PUT HADOOP INTO PLAY

Business process maturity is another issue. Tapping in-memory technology to deliver self-service capabilities to analytics users or as a means to accelerate the performance of big data analytics processes is an admirable goal—but it’s a lost opportunity if business users can’t quickly initiate actions based on the analytical insights that the software produces. “The question is, are your business systems ready to take the results from the data mining exercise,” said Tapan Patel, global product marketing manager for predictive analytics and data mining at SAS. “If the end goal is to make quicker, better decisions and you’re getting insights quickly, but your CRM system is not ready to execute on near-real-time alerts with price changes or customer offers, the value [of in-memory analytics] may not be achieved.” Cindi Howson, founder of BI Scorecard, a research

8 

BUSINESS INFORMATION • SEPTEMBER 2013

“THE REQUIREMENTS AROUND ANALYSIS ARE MUCH MORE INTERACTIVE. YOU CAN’T GIVE SOMEONE AN AN­SWER IN A DAY OR TWO.” —Massimo Fubini, director of ContactLab and consulting company that publishes technical evaluations of BI and analytics software, said in-memory tools have a range of potential uses, from speeding up the performance of existing databases to enabling the addition of new visual data discovery capabilities. “In-memory should be part of everyone’s analytical environment,” she said. “The question is where and how?” n

RECOMMENDATIONS | JACK VAUGHAN

TRAINING, PLANNING NEEDED TO PUT HADOOP INTO PLAY

Experimenting with the vaunted open source distributed framework is one thing; using it in enterprise applications is another entirely.

HOME

9 

BUSINESS INFORMATION • SEPTEMBER 2013

While a lot of ground has to be

covered to deploy the Hadoop Distributed File System and associated technologies to support enterprise uses, a roadmap outlining the path to that destination is starting to emerge. At the Hadoop Summit 2013 in San Jose, Calif., a panel of IT leaders from various industries offered guidance for companies that want to move from experimenting with Hadoop to using it in actual applications. They said it’s easy to get started with open source Hadoop clusters—but taking the technology to the next level is more difficult. Implementers should start small and be prepared to bring in outside training help and think up front about how Hadoop-processed data will become part of operational and analytical processes, according to summit participants. The general rush to try out Hadoop brings its own issues, said Ratnakar Lavu, senior vice president of digital innovation at retailer Kohl’s Corp. in Menomonee Falls, Wis. “You hear about all the things that Hadoop can

RECOMMENDATIONS | JACK VAUGHAN

HOME EDITOR’S NOTE WHEN TO USE HADOOP, AND WHEN NOT TO IN-MEMORY FINDS A PLACE IN BIG DATA’S UNIVERSE TRAINING, PLANNING NEEDED TO PUT HADOOP INTO PLAY

solve,” he said. “You get all this data, then you go off and try to solve everything that you can think of.” But Lavu’s team learned early on that small projects were good starting points with Hadoop. “It’s a whole new way of doing things,” he said. “Start with something small that you can actually manage. It’s about learning.” Lavu also told would-be enterprise Hadoop users to be careful not to solve “problems that are already solved.” For example, existing reports that are being produced and distributed effectively don’t need to be redone in Hadoop just for the sake of changing platforms. Hadoop first gained attention based on the efforts of systems programmers at Internet companies such as Yahoo, Google, Facebook and Twitter. But incorporating the technology into mainstream business and analytics applications takes different skills. Even Web stalwarts such as Salesforce.com have learned lessons while moving Hadoop into a support role for business decision makers. “When Hadoop comes to mind, too often it’s only the data—how big it is. But as you add more and more users, you have to think in terms of the compute [requirements] also. It’s not just the storage,” said Ramesh Koteshwar, a business intelligence architect at Salesforce. Koteshwar anticipates that a sizable part of the company’s workforce will ask questions about data collected in Hadoop. “We expect hundreds and thousands of users on the Hadoop cluster,” he said. Developing robust security capabilities is another part of the process of bringing Hadoop to wider use, he said. 10 

BUSINESS INFORMATION • SEPTEMBER 2013

Hadoop use at Salesforce is very much still at an exploratory stage, and end-user access and authentication are barriers that must be hurdled on the track to broader deployment. “When you really want to bring it into the enterprise, you want to make sure there are security policies and processes in place in front of the Hadoop [cluster],” Koteshwar said.

“IT’S A WHOLE NEW WAY OF DOING THINGS. START WITH SOMETHING SMALL THAT YOU CAN ACTUALLY MANAGE.” —Ratnakar Lavu, senior vice president of digital innovation at Kohl’s Corp. Lavu concurred that the way you fit Hadoop systems into the overall organization is important. “It’s about building the right processes and the right kind of systems and the data feeds as well as the user training and adoption,” he said. “Those are the pieces that enable us to be successful.” While there has been a lot to learn in Hadoop’s early days, at least some of the frontier work has been done, said Neeraj Kumar, vice president of information management and analytics at Cardinal Health in Dublin,

RECOMMENDATIONS | JACK VAUGHAN

HOME EDITOR’S NOTE WHEN TO USE HADOOP, AND WHEN NOT TO IN-MEMORY FINDS A PLACE IN BIG DATA’S UNIVERSE TRAINING, PLANNING NEEDED TO PUT HADOOP INTO PLAY

Ohio. That betokens a benefit in moving to Hadoop now that more pieces of the related data infrastructure have been put into place. “The starters of today are going to have a leg up on us,” Kumar said. “We had to build a lot of ad hoc processes and solutions just because the previous versions of Hadoop lacked those features.” Kumar agreed that Hadoop deployment teams should start small and should find an initial application that provides a “net-new capability” for their companies. “You need to also understand the talent base of your own organization,” he said, adding that in many cases Hadoop creates a need to bring in new skills. As a result, he advised IT managers to start thinking about Hadoop

11 

BUSINESS INFORMATION • SEPTEMBER 2013

“THE STARTERS OF TODAY ARE GOING TO HAVE A LEG UP ON US. WE HAD TO BUILD A LOT OF AD HOC PROCESSES.” —Neeraj Kumar, vice president of information man­agement and analytics at Cardinal Health training issues early in the project planning process. Consultants can help, Kumar said, but they aren’t the ultimate answer: “You do need talent on-site, on the ground.” n

ABOUT THE AUTHORS

HOME EDITOR’S NOTE WHEN TO USE HADOOP, AND WHEN NOT TO IN-MEMORY FINDS A PLACE IN BIG DATA’S UNIVERSE TRAINING, PLANNING NEEDED TO PUT HADOOP INTO PLAY

is site editor of SearchBusiness Analytics.com; in that position, he covers business intelligence, analytics and data visualization technologies and topics. He previously was a news writer for TechTarget’s SearchHealthIT.com website, and he has also written for a variety of daily and weekly newspapers in eastern Massachusetts. Email him at [email protected]. ED BURNS

is a freelance writer who has been covering the intersection of technology and business for more than 25 years for a variety of publications and websites, including SearchBusinessAnalytics.com, SearchDataManagement.com and other TechTarget sites. Email her at [email protected]. BETH STACKPOLE

Business Information is an e-publication of TechTarget’s Business Applications and Architecture Media Group. The websites featured in this special issue are SearchData Management.com and SearchBusinessAnalytics.com. Scot Petersen, Editorial Director Jason Sparapani, Managing Editor, E-Publications Joe Hebert, Associate Managing Editor, E-Publications Craig Stedman, Executive Editor Melanie Luna, Managing Editor Linda Koury, Director of Online Design Doug Olender, Publisher, [email protected]

is SearchDataManagement .com’s news and site editor. He covers topics such as data warehousing, big data management, databases, data integration and data quality. Vaughan previously worked as an editor for TechTarget’s SearchSOA.com, SearchVB.com, TheServerSide.net and SearchDomino.com websites. Email him at [email protected]. JACK VAUGHAN

Annie Matthews, Director of Sales,

[email protected]

TechTarget, 275 Grove Street, Newton, MA 02466 www.techtarget.com © 2013 TechTarget Inc. No part of this publication may be transmitted or reproduced in any form or by any means without written permission from the publisher. TechTarget reprints are available through The YGS Group. About TechTarget: TechTarget publishes media for information technology professionals. More than 100 focused websites enable quick access to a deep store of news, advice and analysis about the technologies, products and processes crucial to your job. Our live and virtual events give you direct access to independent expert commentary and advice. At IT Knowledge Exchange, our social community, you can get advice and share solutions with peers and experts. COVER PHOTOGRAPH: FOTOLIA/FRESHIDEA

Connect with us on Facebook

12 

BUSINESS INFORMATION • SEPTEMBER 2013