Data Discovery Meets Big Data in the Cloud MOVING ... - 1010Data

1 downloads 238 Views 275KB Size Report
with the direct ability to ask and answer any question of disparate big data sets with ... should consider cloud-based b
Data Discovery Meets Big Data in the Cloud MOVING FROM CONVENTIONAL TO NEXT-GENERATION DATA DISCOVERY

WHITEPAPER

Table of Contents Executive Summary __________________________________________________________ 3 Business Intelligence Hits a Wall ________________________________________________ 4 The Emergence of Data Discovery _______________________________________________ 5 Self-Service Dream Come True or IT Nightmare? ___________________________________ 6 From Departmental to Enterprise _______________________________________________ 7 Data Discovery Augmented by the Cloud _________________________________________ 7 Big Data Meets Data Discovery _________________________________________________ 9 Enabling Full-Spectrum Analytics _______________________________________________ 9 Getting “Hands-on” with Big Data______________________________________________ 10 Conclusion ________________________________________________________________ 11 About 1010data ____________________________________________________________ 11

Executive Summary Traditional business intelligence (BI) infrastructures have failed to meet the full analytical needs of business users. An emerging class of data discovery solutions brings all the benefits of BI while also removing the obstacles that traditional BI places between business users and the data they need to analyze – providing them with the direct ability to ask and answer any question of disparate big data sets with ultra-rapid response times. The most advanced data discovery solutions can access and analyze the largest, most detailed, and most disparate data (including third-party data), to quickly yield new and valuable insights without assistance from IT. These solutions provide business users with true self-service, yet retain centralized governance and data management capabilities so all users see a single version of the truth. Further, such solutions not only support the analytic needs of individual departments, but they can extend across the enterprise as a single platform covering everything from basic operational reporting to advanced analytics like predictive modeling and machine learning. In a very real sense, then, they do what the traditional data warehouse was meant to do but never actually could. Supporting self-service for business users on truly big data requires more storage and processing power than is typically available on a user's own hardware. But today there is no reason to think of a user's hardware as the limit of his or her domain. The cloud extends the user's reach to as much hardware and processing power as needed, allowing unlimited scalability both for a single user or a whole company. And cloud-

…all users see a single version of the truth…

based data discovery also facilitates additional – and perhaps even more powerful – capabilities of high strategic value: inter-enterprise collaboration among partner organizations, and the opportunity to monetize high-value data. Because of its ability to dramatically increase the value of data and analytics for business users, organizations should consider cloud-based big data discovery as an integral part of their roadmap for enabling nextgeneration enterprise analytics.

Data Discovery Meets Big Data In The Cloud | [email protected] | Follow: @1010data | www.1010data.com

3

Business Intelligence Hits a Wall Business Intelligence (BI) technology has been the engine of enterprise reporting for more than twenty years. The industry has matured around a set of practices and technologies aimed at delivering governed, standardized, IT-driven reporting. Yet despite the maturity of the industry, BI remains unutilized by more than half of its potential users in the enterprise. i The primary resistance to adoption by end-users lies in BI’s challenges with usability, speed, and relevance. If business users don’t find the toolset to be sufficiently user-friendly in framing and answering questions of their data, they won’t use it. If responses don’t come back quickly enough to enable train-of-thought decisionmaking, they won’t use it. If the solution can’t access the data needed to answer relevant questions, they won’t use it. What the last 20 years have proven is that the challenges of usability, speed, and relevance are innate and insurmountable for the traditional BI stack. At the core of traditional BI’s limitations is the requirement that – in order for a business user to ask a question – every data element a business user would want to analyze must be carefully integrated into an artfully crafted data model, often aggregated, and then placed into a second multidimensional structure with its own carefully crafted design and semantic nomenclature. Element of the Traditional BI Stack

Challenges Introduced

Data Integration & Transformation



It takes months of work to make data sets from new operational systems available for analyses

Data Loading



Update window time limitations prevent ongoing analysis of data with rich attributes and full granularity

Data Modeling



The relationships between new data and all existing data must be fully modeled and understood before any specific analysis can be conducted

Aggregation & Sampling



Data must be pre-aggregated or sampled in multidimensional cubes or summary tables in order to deliver acceptable query performance

Extra-Enterprise Data Sets



Limited facilities exist for incorporating 3rd-party or one-off data sets (e.g. sourced from the Web) in “exploratory” analyses



Advanced modeling, statistical analysis, and data science require a secondary toolset, infrastructure, and specialized training beyond the enterprise BI stack

Advanced Analytics & Data Science

Figure 1. Each layer of the traditional BI stack introduces roadblocks to conducting analytics with the usability, speed, and relevance required by business users.

Data Discovery Meets Big Data In The Cloud | [email protected] | Follow: @1010data | www.1010data.com

4

And, of course, the most valuable and pressing questions are rarely those that can afford to wait for carefully integrated data. Without fail, it is the latest data – that data supporting the newest strategy, the newest aspect of the business – which is most important. And so, the IT reporting queue fills up. Users “go rogue,” attempting to conduct analysis on their own in ways they best understand: one-off data pulls churned through ubiquitous desktop tools like Excel and Access. But these tools invariably fail to meet their needs for scalability or analytical robustness.

The Emergence of Data Discovery Data discovery has now emerged in force as a response to the shortcomings of BI ii. Conceived from the ground up to enable analytics for business users, by business users, the most effective data discovery solutions are designed with three key criteria in mind:

1. Ask and Answer Any Question Leading data discovery solutions employ flexible analytic engines and intuitive interfaces that allow business users of all levels to ask questions of their data with little training. The very same solutions enable analytics of nearly unlimited sophistication by offering data interaction options ranging from UI-based to programmatic – all atop a single, unified analytic engine.

2. Performance and Scalability Analytics users recognize that the #1 criteria of analytics usability is performance. The most effective data discovery solutions employ special-purpose, non-traditional database and processing capabilities, enabling iterative, train-of-thought analysis with lightning-quick response times against broad, deep, and fully detailed data. Performance, in concert with the ability to scale to the largest volumes of big data (e.g., trillions of rows with unlimited attributes), helps ensure that business users never reach a technical limit that caps the value they can extract from data.

3. Data Mashups and Data Sharing The best data discovery solutions are designed to embrace new, unfamiliar data from any source, and enable business users to rapidly integrate it with existing data sets. Furthermore, leading solutions facilitate the sharing of data and analytics with 3rd parties outside the enterprise, enabling high-value collaboration in making data-driven decisions. As data discovery solutions grow in popularity, a range of tools with varying specifications have been introduced to the market. Enterprises should closely examine the capabilities of the toolsets they consider to ensure their chosen solution set meets and exceeds current and anticipated analytics requirements.

Data Discovery Meets Big Data In The Cloud | [email protected] | Follow: @1010data | www.1010data.com

5

Self-Service Dream Come True or IT Nightmare? With business users enabled to access and analyze all the business data they could ever need, data discovery has the makings of a self-service reporting and analytics "dream come true" for IT departments and business teams. But for all its ability to satisfy long-awaited analytics needs, enterprises must carefully consider their approach to data discovery – lest they give rise to "rogue" solutions. A number of easily downloadable, desktop-based tools have emerged in the market, empowering end-users with the interfaces and flexibility they need to analyze data from their own perspective. Some of these offerings can easily integrate with mature back-end infrastructures, but they are often deployed without an enterprise integration plan. And 2/3 of all organizations already have more than one non-approved BI solution resident in the enterprise. iii

…enterprises must carefully consider their approach to data discovery…

Like Excel before them, such tools can lead to users keeping local copies of data, the so-called "spreadmart” problem. In addition, on a standalone basis, these desktop tools aren't necessarily designed to support the standardized metrics and KPIs required in the decision-making environments in which most enterprise users exist. Without the larger consideration of how users "get on the same page" when it comes to data, desktop tools alone can create the appearance of short-term gains in productivity – with the risk that gains are offset as users struggle to resolve conflicting results while attempting to make department- or enterprise-wide decisions based on data. Such challenges were characteristic of the early days of desktop decision-support (involving Excel and similar tools) that resulted in the era of BI. Along the way there evolved some valuable principles including efficient centralized management practices, data governance, and the dream of a "single version of the truth." Enabling sustainable self-service, the most effective data discovery solutions carry forward such valuable principles of BI – while removing roadblocks that traditional approaches have created between users and their data. The principles include standardized governance, centralized metrics, and analyses so that business users are all on the same page with respect to KPIs, business performance, and so on.

Data Discovery Meets Big Data In The Cloud | [email protected] | Follow: @1010data | www.1010data.com

6

From Departmental to Enterprise In fact, the era of the “mature” data discovery solution has arrived, enabling many leading organizations to become successful in extending the benefits of data discovery beyond individual departments to the entire enterprise. These enterprise-class data discovery solutions are designed with a “business user first” approach but add all the capabilities of full-fledged enterprise analytics applications including: •

Ability to integrate and analyze any data at the lowest levels of detail



Centralized management and access control



Reusable, “build once, deploy anywhere” analytics components



API’s and SDK’s for application integration



Object and cell-level permissioning and security

While the analytics challenges of individual departments continue as the entry point for data discovery solutions, data discovery should nonetheless be enabled with an enterprise strategy in mind. And, as will be discussed shortly, with a vision towards going beyond the enterprise for inter-enterprise collaboration.

Data Discovery Augmented by the Cloud First-generation data discovery offerings lived on the desktop. Next-generation solutions take full advantage of the scalability and deployment flexibility offered by the cloud. The scalability of the cloud is practically a requirement for effectively managing true big data with large volumes and disparity – not to mention the ability to deploy with reduced up-front investment, pay-as-you-go, and to scale storage and processing power on-demand. But the cloud also enables a significant new range of strategic options and capabilities that go beyond scalability and financial considerations. From a functional viewpoint, cloud-based data discovery opens up a range of unique possibilities worthy of strategic examination:

1. Inter-enterprise collaboration Visionary minds have long espoused the potential benefits of tight collaboration among business partners and complementary enterprises. But the key to tight collaboration is the ability to share data and insights –

Data Discovery Meets Big Data In The Cloud | [email protected] | Follow: @1010data | www.1010data.com

7

a vision that has eluded most organizations. Data discovery in the cloud, however, has proven effective in enabling inter-enterprise collaboration. With the cloud being neutral, common ground accessible from anywhere on Earth, and with data discovery's ability to easily merge disparate data sets and display the same data within different organizations using each enterprise’s business language, the combination is helping usher in a new level of collaboration among business partners and other outside collaborators.

2. Data monetization For some businesses, the data captured by their operations is extremely valuable to business partners. Such data represents an opportunity to strengthen partnerships, increase sales and customer satisfaction, as well as to monetize their data into a new direct revenue stream. Data discovery is today enabling this type of data monetization across a range of enterprises.

3. Data marketplaces and exchanges Many types of analysis stand to benefit from the additional insights that come from third-party data feeds. In the past, such data has been challenging to load and merge with enterprise data because of the sheer data volumes, data complexity, and difficulty in proving the data’s utility before making the significant investment to acquire it. Big data discovery solutions that operate in the cloud offer business users access to value-added data feeds for integration into their analyses. With both the user’s enterprise data and the syndicated data located on a single platform, users can test new feeds on a trial basis to determine their value, and then license the feed once the potential for return-on-investment has been validated.

Cloud-based big data discovery can offer superior speed, scale, and ease of source data integration, with the ability to enable inter-enterprise collaboration and data monetization.

Data Discovery Meets Big Data In The Cloud | [email protected] | Follow: @1010data | www.1010data.com

8

Leading data discovery solutions can be implemented on-premises, in the cloud, and in hybrid configurations, allowing enterprises to leverage the benefits of cloud-based offerings to the degree that meets their unique requirement.

Big Data Meets Data Discovery The emergence of data discovery comes on the heels of another major trend that is changing the face of analytics across the enterprise: big data. At first glance, the two trends would seem at odds. Big data representing the power to fully harness massive data sets at the most granular levels. Data discovery delivering full empowerment to business users to ask any question of that data, without restriction. The first instinct from a traditional analytics viewpoint would be to develop “guard rails” to prevent the business user from generating a query that would bring a system to

…reducing the overall number of tools required to deliver a full spectrum of business analytics.

its knees. But doing so in the form of aggregation, sampling, and pre-calculation often results in the distortion and loss of critical information, masking the truth of what is actually occurring in the business, and leading to the inability to make decisions that have a direct effect at the transaction level. The transaction level being, of course, exactly where real-world business occurs.

But leading data discovery approaches fully embrace big data and its analysis in the most granular and voluminous possible forms. Big data discovery breaks free from the limitations of traditional relational databases by utilizing underlying data structures that were designed from the ground up to support the asking and answering of any analytical question.

Enabling Full-Spectrum Analytics Any responsible IT staff should be concerned with the uncontrolled introduction of new software and tools into the enterprise. Today’s typical analytics environment already includes relational databases, multidimensional databases, transactional databases, data warehousing appliances, extract-transform-andload (ETL) tools, multiple BI platforms, dozens of specialized analytics applications, statistical analysis tools, predictive modeling tools, and more. But here, the most advanced data discovery solutions can actually offer a benefit. Because of the flexibility of their underlying data structures, and their emphasis on open-ended analytics, data discovery solutions can

Data Discovery Meets Big Data In The Cloud | [email protected] | Follow: @1010data | www.1010data.com

9

combine the analytics capabilities of multiple toolsets into a single platform – reducing the overall number of tools required to deliver a full spectrum of business analytics iv. This means a range of analysis within a single platform: from descriptive (what happened?) to diagnostic (why did it happen?) to predictive (what will happen next?) to prescriptive (what should I do about it?). These single platform environments support every level of sophistication required to conduct this analysis, from simple queries and calculations to operational reporting to multidimensional analysis to statistical functions, machine learning, and complex modeling.

Getting “Hands-on” with Big Data Business users understand their data deeply. When working with business data, users express a strong desire to be “hands-on” with their data so they can see it, feel it, consider its meaning, and use it to inspire new hypotheses and possibilities. For smaller and simpler data, no tool is more ubiquitous, and no tool has proven more effective at giving users hands-on access to data, analytics, and modeling capabilities, than the spreadsheet. Unfortunately, once data gets large and complex, or where users need to collaborate with colleagues in performing analyses, the traditional spreadsheet model becomes too rigid, too fragile, and a nightmare to manage. First-generation data discovery tools leaned heavily and exclusively on data visualization capabilities to take the place of spreadsheets. While data visualization has proven effective in a number of use cases, users continue to return to their desktop spreadsheets – with all their limitations – to conduct much of their analysis. The latest generation of discovery solutions has solved this puzzle, offering a new type of spreadsheet interface that lets business users get hands on regardless of the size and complexity of data. Advanced data discovery spreadsheets are designed for sharing and collaboration with other users, and take full advantage of the latest in governance and centralized management – representing a true enterprise-class solution to the challenge of delivering hands-on access to data for business users.

Data Discovery Meets Big Data In The Cloud | [email protected] | Follow: @1010data | www.1010data.com

10

Conclusion Cloud-based big data discovery solutions have the ability to dramatically increase the value of data and analytics for enterprise business users. These solutions enable virtually unlimited access and analytical capability atop both enterprise and third-party big data, through the type of interface that makes the most sense to business users. The latest cloud-based big data discovery solutions alleviate concerns about data governance by enabling centralized management of data and analytics. Furthermore, they enable collaboration both within and beyond the enterprise with the ability to monetize the most valuable data streams. Enterprises that stand to benefit from new and better insights from their own data and third-party data should consider cloud-based big data discovery as an integral part of their roadmap for enabling next generation enterprise and analytics.

About 1010data 1010data provides a unique, cloud-based platform for big data discovery and data sharing. It is used by hundreds of the world's largest retail, manufacturing, telecom, and financial services enterprises because of its proven ability to deliver actionable insight from very large amounts of data more quickly, easily, and inexpensively than any other solution. Please visit www.1010data.com for more information.

i

Dan Sommer, Gartner Business Intelligence Summit, March 2013 TechRadar: BI Analytics, Q3 2013, Forrester Research, July 2013 iii Global BI Benchmarks Online Survey, Forrester Research, Q2 2013 iv Advanced Analytics: Predictive, Collaborative and Pervasive, Gartner Research, Feb 2012 ii

Data Discovery Meets Big Data In The Cloud | [email protected] | Follow: @1010data | www.1010data.com

11