Why Backup Is Broken

2 downloads 190 Views 10MB Size Report
a foundational data protection solution that provides broad coverage, typically an enterprise backup solution, with a ri
Why Backup Is Broken By George Crump

CHAPTER 1 Why Backup Breaks and How to Fix it For most organizations backup, the process of regularly and consistently protecting production data, is fundamentally broken. As a result, these organizations have very little confidence in IT’s ability to recover data at all, let alone promptly. To try to fix broken backups, IT has thrown everything but the kitchen sink at the problem only to end up with an even more complicated process that is also brittle and still doesn’t accomplish the primary objective.

Why Backup Breaks Backup interacts with almost every component of the data center; every server, every network and indeed every storage system. For the backup to complete successfully, quickly and deliver a quality, recoverable payload, the solution has to understand what it is protecting and how to best protect that component. One of the primary reasons backup breaks is the legacy vendors are historically slow in protecting new applications or operating environments. Legacy solutions have also been slow to adopt new technologies that would make backups and restores occur faster and also allow the backup environment to scale further or to fulfill a higher purpose. The lack of protecting new environments and adopting new technologies leads to backup application sprawl or backup augmentation. New solutions come to market almost every year promising support for the latest environment or technology, but those solutions don’t support the traditional applications and hardware. The idea that IT is solving a problem by adding a new solution or multiple solutions, further exacerbates the issue. This forces the data center to support both legacy and modernized backup solutions. It is not

uncommon for IT to support three to four backup solutions and three to four high availability solutions, which doesn’t seem very modern at all. There is also a problem of over-protection. There are many different ways to protect data today, storage system snapshots, replication, built-in application protection and of course traditional backup. While there is nothing wrong with a belt and suspenders approach to data protection, there is such a thing as too much. The issue is that there is no single control point for all of this protection and each of the processes runs on its own. The siloed approach leads to a management challenge and not knowing which process is the best source of data for a given recovery need.

Fixing Broken Backup The first step to fixing a broken backup is for IT to develop an overall strategy for data protection which includes understanding the recovery point and recovery time objectives for each application or data set. Then IT can apply the appropriate protection methods to meet those objectives best. To ensure these multiple protection methods work in concert together, IT needs a foundational data protection solution that provides broad coverage, typically an enterprise backup solution, with a rich history of innovation, and providing support for new applications, new operations, and ingenuity around backup. This foundational solution needs to understand and manage other data protection methods, like snapshots, for example, that can deliver higher RPO and RTO when required. Despite advances in technology, data protection has become more complicated and less reliable over the past several years. Part of the reason for this complication is the result of growing data sets and increased demand for recovery. A large part, however, is more the result of IT trying to respond quickly to the increased demands and departing from their core strategy. To keep pace with changes in the data center IT needs to select an enterprise-class backup and recovery solution that can provide broad coverage while managing other data protection methods required to meet strict RPO and RTO.

CHAPTER 2 Why the Cloud Breaks Backup

For organizations that realize their backups are broken, the cloud appears to be an appealing way out of the mess. The problem is most legacy backup applications don’t have integrated cloud support, so leveraging the cloud either requires a bolt-on solution or a new option. The result is the broken backup actually becomes more fragmented, going from bad to worse.

The Cloud Backup Integration Problem The first problem with cloud backup is how the existing backup software solution supports the cloud. In truth, some legacy on-premises backup solutions can leverage the cloud in some way. The methods to make and maintain that cloud connection cause the problem.. The most common method is to direct backups to a cloud gateway or appliance. These appliances typically present a SMB or NFS mount point where the backup application targets its backup. The appliance usually has some on-premises capacity, and replicates the latest backup to the cloud as bandwidth allows. In most cases, the organization maintains its data protection storage on-premises and with the cloud appliance’s storage, creating two storage tiers to manage. To further complicate matters, the management of the cloud appliance is separate from the backup solution, so settings, as well as management, have to be made in both systems. Last, because the gateway and the backup software are not integrated, there is confusion over what is a completed backup. The backup software considers the backup complete when it is stored on the appliance, but the organization may not think the backup is done until all data is fully replicated to the cloud. IT needs a solution that can integrate directly into the cloud via an S3 connection, which allows the software to integrate with most major cloud providers, as well as to on-premises object stores. Since the integration is built-in to the backup software, the backup software is then fully aware of the data transfer process, and a separate silo of storage does not need to be created.

The Multi-Cloud Backup Problem Cloud integration also solves the second problem that faces IT professionals looking to leverage the cloud: how to leverage multiple clouds. Many data protection solutions supporting cloud storage, especially cloudonly solutions, support just one cloud provider.

IT should look for a backup solution that can support multiple clouds. Then the organization has the flexibility to move between clouds to gain a pricing advantage or take advantage of specific services. A native S3 built right into the backup solution enables the organization to move between cloud suppliers or even send certain backups to one cloud provider and send a different set to another cloud provider.

The Hybrid IT Problem Hybrid IT means applications can run on-premises or in the cloud and can freely move between the two locations without the data protection team knowing about it. It also allows the creation of unique data in the cloud, and therefore must be protected. Also consider that organizations more than likely have Software as a Service (SaaS) applications. Data created by these applications also needs to be protected.

The Cloud Data Protection Impact Instead of simplifying the data protection process, integration of the cloud makes it more complicated if the underlying data protection solution does not offer native support for it.

Most legacy backup and recovery products get their cloud support by bolting on an appliance, whereas most “modern” solutions don’t provide the robust support required for on-premises protection and they cannot change cloud targets, especially within backups. Most legacy and modern backup products don’t offer support for backup within the cloud, either for the organizations cloud-native applications or the data created by SaaS environments.

What IT Needs IT needs a backup solution with robust on-premises capabilities and native cloud integration. If that integration uses the S3 protocol, the software could then support a variety of cloud locations and vary those locations between backups. The backup solution should also have the capability to protect cloud-based data both for cloud-native applications as well as SaaS data sets. Ideally, the integration of cloud storage as well as the protection of cloud data should integrate into a single backup solution. The ultimate backup solution should also have flexibility to set policies that manage the entire process.

CHAPTER 3 Why Virtualization STILL Breaks Backup With the broad adoption of server virtualization by data centers, it became apparent that protection of the new environment was very inadequate. Organizations faced in-guest protection or very fragile off-host architectures. As adoption continued, VMware and others made significant strides in providing robust APIs that backup vendors could use to protect virtual infrastructures more effectively. Despite progress in capabilities, most IT teams still cite protecting the virtual infrastructure as a top pain point in their data protection strategies. These environments need more than just basic protection. They need an intelligent protection solution that can be as flexible as the virtual environment itself.

Dealing with VM Mobility One of the earliest advantages of a virtualized environment was its ability to migrate virtual machines (VMs) from one host to another. Now, though, virtual machines can migrate to hosts in other data centers or in the cloud. Data protection solutions need to be able to track a VM’s movement and protect it appropriately. Another advantage of virtualization is the ability to almost instantly create another virtual machine and install a new application. Unless the organization has a strict change control process in place, the data protection team may be unaware of the existence of new applications and data until an application crashes and loses its data. Then everyone knows about it. Coinciding with the instant creation problem is the problem of VM sprawl. After creating and using a virtual machine, it is frequently left active and unused, but still consumes resources and needs protection. Modern backup applications need to go beyond just basic backup of known VMs and help the IT team identify new VMs, even automatically backing them up. These applications also need to notify the backup team of a virtual machine that has not changed over time. At this point, the IT team can decide to archive the VM, delete it or at least exclude it from the backup set.

The Virtual-Only Backup Problem The early challenges with protecting virtual environments led to the creation of VMware-specific data protection solutions. Not encumbered with integrating virtual data protection into existing code bases, these new vendors were able to jump out to an early lead and provide features that legacy vendors did not, at least initially. The problem is that these VM-only data protection solutions exacerbated an emerging problem in the data center, data protection sprawl. Data protection sprawl occurs when an organization uses multiple, non-integrated techniques to protect data. These can include snapshots, in-application protection (database dumps), enterprise data protection and now protection specific to the virtual environment. The justification from most of these VM specific data protection solutions is that the data center will one day be 100% virtualized. While adoption of virtualization is very high, most enterprise data centers will never be 100% virtualized. There will more than likely always be some physical systems, often running in their own cluster that still need to be protected. Virtualization is not necessarily the end game for the data center. Organizations will continue to count on physical systems for quite some time. In fact, virtualization itself will evolve, with new hypervisors and new takes on the technology, like containers. Enterprise solutions that protect physical and virtual environments and have a proven track record of adapting as new technologies come to market, may be a better fit for the data center. Additionally, they can ensure current challenges presented by virtualization don’t break the backup process.

CHAPTER 4 Why Do Point Products Break Backup One of the most alarming trends in data protection is the number of different solutions data centers are deploying and managing so they can keep up with the organization’s recovery expectations and budget realities. The demands for rapid recovery are leading organizations to deploy separate solutions for each platform it supports, just to get one particular feature. It also leads them to deploy unique hardware solutions in an attempt to drive down costs. The result is data protection sprawl.

Instant - The New RTO The biggest challenge facing IT is the demand for near-instant recovery. To meet the instant recovery demand, IT professionals leverage storage system snapshots, application replication solutions and backup products. All of these methods have their role, but the problem is the need to operate and manage each one independently from each other. Additionally, there is a lot of redundancy between the solutions, as the data protection process doesn’t cascade between them. The result is IT professionals have multiple copies of data (and many times the same data) at different points in time, with no good system to determine which one is the best version from which to recover, given the recovery scenario. To make matters worse, the sprawl of storage systems and operating environments leads to each using a flavor of these three data protection methods. Each storage system has its own snapshot capability with its own interface and scripts, as does each operating system/ environment. Then, each environment often gets its own backup application, and each application requires a separate storage system, again with unique interfaces. Very quickly, the data center is managing dozens of combinations of techniques to meet its data protection requirements. Recovery becomes a fire drill where IT has to pull off miracle recoveries instead of taking an organized approach to the process, which instills organizational confidence in the process.

Bringing Order to Chaos Enterprise-class data protection software has to evolve to keep up with the new instant RTO challenge. First, it needs to leverage and manage the existing processes. For example, backup software should work with snapshots by triggering them as part of its data protection schedule and presenting them to the backup application for protection. This integration should include the ability to determine the best available data copy from which to recover. The application should use the most recent, viable snapshot. Additionally, the backup application should leverage its online backup capabilities to enable the snapshot to capture application consistent snapshots instead of crash consistent snapshots. The next step is consolidating data protection hardware to drive down the overall cost of deployment. There are two components to data protection hardware; the servers running the backup software and the hardware that stores the protected copies of data. Data protection servers have become increasingly taxed in recent years as organizations expect more from the backup process, which now includes copy data management, instant recoveries and advanced search. Data protection storage needs to be increasingly scalable and even more capable. The systems need to

CHAPTER 5 Hyperconvergence is NOT Backup

sustain the overhead of deduplication and compression while at the same time delivering acceptable performance in live recovery situations.

Software Defined Data Protection A viable solution is for backup software to implement its own storage software capabilities specifically designed for data protection and integrating all of the features into the solution, including running the data protection software itself. The result is a converged data protection infrastructure, a single data protection cluster (a collection of servers networked together all sharing resources) that can host all protection and secondary storage operations while scaling to meet future demands. IT needs to be careful not to be enticed by every new data protection software and hardware solution that becomes available. Obviously, there will be times that it needs to implement a point solution, at least temporarily, to meet a specific need but generally it should push everything to a more centralized enterprise-class solution that can deliver a consolidated foundation for both hardware and software.

One of the riskiest claims a vendor ever makes is “we’ve eliminated the need for backup” and hyperconverged infrastructure (HCI) vendors make this claim frequently. Good data resiliency is not good backup, and in fact, some of the work that vendors do to make these claims actually hurts what the system was really designed to accomplish.

How Can HCI Solutions Claim SelfProtection Built on a group of servers, HCI solutions typically cluster the servers together via a hypervisor. Within that cluster runs a software defined storage (SDS) solution. Part of the storage software’s responsibility is to distribute data across the cluster for resilience and mobility. Today, most HCI solutions achieve this resiliency via a technique called erasure coding (EC). Erasure coding segments data while writing and distributing it across the servers (nodes) within the cluster. Part of the segmentation is the creation of parity bits, and placing them on one or more nodes. The number of parity bits roughly equates to the number of nodes that can fail prior to the HCI solution facing an outage or data loss.

At this point in the protection process, no HCI vendor will claim they’ve eliminated the need for backups, or at least they shouldn’t. What gives the HCI vendor the guts to make the “eliminate backup” claim is what comes next. The storage behind most HCI solutions can take, in most cases, nearly unlimited snapshots. This high number of potential snapshots provides the solution with a basic point-in-time capability. The customer can, in theory, roll back a virtual machine to any point in time as long as there is a snapshot available. But, even with all these snapshots the solution is still susceptible to a disaster impacting the entire system or site.

the cluster that copy is deduplicated, and since it is deduplicated, the copy requires almost no additional storage space. In some ways, this copy is better than a snapshot since the backup copy is not dependent on the primary data volume as a snapshot would be. But, the deduplicated copy is totally dependent on the deduplication metadata staying intact, so there is a degree of vulnerability.

The second HCI confidence booster is asynchronous replication of those snapshots. The HCI solution collates its snapshots and periodically sends the delta between snapshots to another location. Additionally, in many cases the solution can take a different set of snapshots of the landed data. At this point, many HCI solutions consider the data protected and make their claim of “eliminating backup”.

While it is certainly true that HCI solutions deliver a lot of redundancy and data resiliency, none of these protection steps that HCI offers provides a true backup. A true backup is a secondary copy of data on a completely separate storage infrastructure, preferably running separate storage software and on a different type of media.

Some HCI solutions will take the protection claim a step further and claim that they can actually create zero capacity impact backups on the cluster. The customer actually makes a copy of data, but within

Cluster + Erasure Coding + Snapshots + Replication = Backup?

The Problems with HCI “Backup” The first problem with “HCI Backup” is that it only protects data within the HCI environment. Most data centers are not “all in” with their HCI solution, and in

fact, most HCI installations, at least to this point, target specific projects, so the reality is that most of the environment is not within the HCI environments. The second problem with HCI backup is that it is completely dependent upon the storage software. If a problem occurs in the software, data is exposed. A very real example is the deduplication process. Deduplication works by identifying and removing redundant segments of data. To work, deduplication requires a fairly sophisticated metadata table. If that table corrupts or has a failure and there is no separate backup copy of the table, then more than likely, it will be necessary to recreate all that data. The third problem is not unique to HCI but to any solution or organization that is going to count on snapshots as their primary point of recovery. Most snapshots appear to the system as another copy of a volume. The software that takes the snapshots does not provide any ability to find specific data within the snapshots and certainly does not provide any ability to find data across snapshots. This limitation can be problematic if for example an administrator is looking for a specific version of a specific file that may be in one of one hundred, or thousand, snapshots. A final problem with HCI as Backup is that it requires that the customer standup an HCI environment at both the primary data center and the secondary/DR site. Customers may just want to replicate data to a central repository at the DR site and only instantiate certain applications during a disaster. Again, keeping in mind that most organizations are not fully committed to HCI, standing up a second cluster to support applications that more than likely will not be the first ones recovered in a disaster, is a waste of DR budget. In the end, enterprise backup still makes the most sense for the majority of customers when deciding how to protect their HCI environments. While an organization that is 100% on a single HCI infrastructure may get away with HCI as backup, most organizations will find the problems with lack of copy diversity, inability to find data and the additional costs at the DR site to be too problematic for them to consider seriously this option. Instead, customers should look for enterprise solutions that provide direct support for the HCI solution. The advantage is then they can leverage a single backup solution to protect their entire data center while avoiding the need to run a different data protection application for each unique environment.

CHAPTER 6 Why Scale-Up Architecture Breaks Backup The backup architecture will likely store 5 to 10 times the amount of data that the primary storage system holds. Additionally, organizations are looking to use backup data for far more than just an insurance copy of data. They want to leverage the backup copy for testing and development, reporting and business analysis. To keep up with this growth both the software and hardware components of the backup architecture need to scale, ideally in lock-step with each other. The problem is, most architectures don’t scale and if they do they don’t compliment each other in the way they scale.

Software Scaling Issues The basis for most backup solutions is the concept of a single backup server, which is the primary control point of the backup process. In many cases, it is necessary to send all backup data to this single server. The server also houses all the metadata information such as the backup catalog and media index. Some enterprise solutions also require sending data to multiple secondary servers, often called media servers or slave servers, which act as an alternate point to send data. Media servers protect the primary server from data overload by directing the incoming data to storage arrays or tape libraries they control. This enables the primary backup server to focus on job management and maintaining metadata and indexes. The software faces two fundamental scaling problems. First, in most cases, it is totally dependent on the underlying data protection storage hardware, which as we will learn, has scaling issues of its own to scale capacity. Most solutions are very limited in their ability to distribute the metadata and indexing information

that the backup software must maintain. Second, at some point, the data about the data becomes too large for the backup solution to maintain, forcing IT to either implement another primary backup server module or prune history from the current system.

Scaling Data Protection Storage Most data protection architectures today count on hard disk-based storage for most, if not all, of their protection storage. Making sure the raw capacity of that system can grow to meet the storage demands of the enterprise is critical. It is also important, however, that protection storage grows to meet the performance demand of the process. In the past, the primary performance concern was how quickly protection storage could ingest data received from the various backup and media servers the enterprise had implemented. While ingest performance is still critical, a new measure of performance is now required to meet the desire to use backup storage to host recovered volumes and support copy data tasks like test-dev and reporting. Scale-up hardware solutions have a predefined wall in terms of capacity and performance. Once that wall is hit, either the organization needs to purchase an additional data protection storage solution or it needs to perform a complex forklift upgrade to a new system and migrate old backup jobs to the new system.

The Cost Problem The scaling limitations on both hardware and software lead to unpredictable costs. When either the backup software architecture or data protection storage reaches its capacity or performance limits, the “solution” is to buy an additional system or to upgrade the existing one. In either case, these “solutions” require the purchase of new hardware and more than likely additional software, not to mention additional support contracts. These additions are not small increments but major purchases that in most cases weren’t properly planned for. When scale-up hardware or software hits the wall, it puts the entire data protection process at risk and data protection stops until IT addresses the limitation. In most cases these limitations appear without warning, and IT needs to scramble either to prune historical copies or to purchase additional hardware and software. IT needs to look for solutions that can incrementally scale both the software and hardware components of the data protection architecture as needed, as well as provide some predictive analysis of resource utilization in order to plan for the next increment.

CHAPTER 7 Why Snapshots Break Backup Snapshots are incredibly popular and a powerful way to protect data. They enable an organization to protect data rapidly without consuming disk capacity until additional changes occur. Most storage systems can also rollback snapshot images for rapid restores and even allow IT to “drill inside” the snapshot to recover specific files. But, snapshots are not perfect, and they shouldn’t be used as the organizations only data protection strategy any more than RAID should.

Fourth, each storage vendor has its own method of executing and maintaining snapshots, and since most data centers have anywhere from three to six different primary storage systems, counting on snapshots for data protection creates a systems management challenge. IT needs training on each storage system snapshot interface, each of which requires a unique set of skills to manage. As a result, IT needs to interface with three to six GUIs to monitor snapshot success.

The Problems with Snapshots

Finally, while storage system vendors have improved their integration with operating systems to ensure quality, consistent copies, they often don’t integrate at the application layer to guarantee application consistent copies.

There are several problems with snapshots as the only data protection strategy for the organization. The first and most obvious is that the protection is occurring on the same system as the production data, which is no different from a user copying a set of files from one directory to another. Any failure in the storage system results in loss of the snapshots. Second, while snapshots have zero capacity impact at backup, as data in the production volume changes, snapshot capacity requirements will increase. The capacity consumption is on the organization’s most expensive tier of storage, production. As a result, while most organizations will take snapshots multiple times per day, they won’t hang on to each snapshot for more than a week or so prior to offloading it to secondary storage. However, offloading to secondary storage is a secondary process. Most storage system vendors have no ability to copy data contained in snapshots directly to a cost effective secondary storage tier. Third, there is no cataloging of snapshots, so finding a specific version of a file requires knowing what snapshot has the particular version that the user wants, then mounting that snapshot and manually copying that file from the snapshot back to production storage. Snapshot technologies lack a search capability that enables IT to search for a file by name or other metadata, then have a results page show each version of that file and which snapshot contains those versions.

Snapshots as a Complement to Backup Snapshots should complement the backup process, not replace it. Snapshots should be the source that the backup application copies data from to avoid impacting production data. Some backup solutions will leverage snapshots through the use of scripts, but scripting introduces its own set of problems. For example, each storage software or application upgrade means that the script has to be re-tested and potentially re-written. Backup should go further in its support of snapshots and not require scripts to get there. A modern data protection solution should actually manage the snapshot process, creating a centralized console for all data protection operations. Integrating snapshot management into the data protection software will enable the software to trigger the snapshots, ensure the application is in a ready state and allow IT administrators to take advantage of the data protection solution’s built-in cataloging capability, making finding recovery data much easier.

The challenge for integration is ensuring support for enough primary storage vendors to make the effort worthwhile. If the backup software only supports one or two primary storage vendors then the likelihood of a single, centralized view being possible is unrealistic. The key is for the data protection software to take the lead and create a framework that makes integration an easy task for storage system vendors.

Snapshots are an ideal way to recover from application corruption or user mistakes. Snapshots however, are not backups but they can aide in the creation of protected copies of data on secondary storage. The challenge that organizations face is making the connection between the two. Data protection solutions should take the lead and create an easy way for vendors to interface with the backup software, in order to create a single data protection console that spans a variety of primary storage systems.

CHAPTER 8 Why Endpoints Break Backup

Additionally, endpoints are largely out of IT control. The lack of IT control means that any solution installed on those laptops requires user acceptance. If the solution slows them down, or worse keeps them from working, they will go out of their way to disable it. It’s difficult to stop a user one thousand miles away from disabling the backup software. Lastly, there is the concern about scale. Enterprises may have thousands of endpoints, and each user may have 2 or 3 endpoints each. The software solution needs to not only scale to handle the number of simultaneous inbound backup jobs, but also the capacity required to store the protected copy of all of those systems.

The EndPoint Backup Silo Problem According to several recent studies, about 30% to 40% of an organization’s data is stored uniquely on endpoints (laptops and devices). This unique data is not stored or backed up anywhere in the data center, and is outside of the organization’s control. If one of these laptops fails or is stolen, the organization will lose that data forever. Compounding this problem, Endpoints are also the most vulnerable; they endure harsh treatment and are very susceptible to loss or theft. Despite the very real potential for loss of organizational data, most organizations don’t include endpoints in their data protection strategy.

All these challenges and the lack of a quality solution from enterprise vendors have created a second class of solutions targeted directly at endpoint backup. Like other environment-specific protection solutions, an endpoint-only solution may have the advantage of focusing on the specific problem, but at the cost of further complexity—since you’ve now added yet another tool—and expense. Endpoint data protection solutions in particular are problematic since they often force organizations to use the cloud for backup storage and don’t provide options as to which cloud provider can be used.

The classic endpoint “protection” strategy was to tell users to copy data to the enterprise’s network attached (NAS) storage or file server storage. Not only is the “just copy” a bad data management practice it is also very rarely actually done and doesn’t address the remote office, road warrior, or tablet use cases. Endpoint protection of laptops along with desktops needs to be an integral part of the general data protection strategy. The problem is that endpoint protection solutions provided by legacy vendors have been so bad that the “just copy” methodology was actually more reliable. The lack of a reliable endpoint data protection solution has forced organizations to look for external solutions from startups or endpoint products designed for the consumer market.

The Enterprise Endpoint Solution

The Endpoint Data Problem The first problem with endpoint data protection is its mobility. Endpoints, by definition, are on the move. They are rarely, and in many cases, never connected to the corporate network. Endpoint data protection solutions also need to be able to keep endpoints protected across very limited bandwidth.

The real answer is to look for a solution that integrates directly into the overall enterprise data protection process in order to centralize the management and storage of data and goes beyond the ‘just copy’ method. Leveraging an enterprise solution should also allow the customer to choose whether or not to send data to the cloud, as well as which cloud, instead of forcing their hand. The key is to look for an endpoint protection solution that meets the organization’s and user’s needs while integrating into the overall protection umbrella. While many enterprise data protection companies have given up on endpoint protection, a few have advanced their initial offerings and now provide solutions that are both user friendly and enterprise capable.

CHAPTER 9 Why Scale-Out Architectures Break Backup In a recent blog, we discussed how scale-up architectures break backup, forcing customers into expensive and disruptive forklift upgrades. Scale-out architectures are the logical solution to the scale-up challenge. But, scale-out architectures create problems of their own. IT planners need to look for the right type of scale-out architecture.

Backup Software Doesn’t Scale-Out One of the major challenges facing scale-out backup architectures is that for the most part, only the backup storage hardware scales-out. Scaling out backup software is more vital than ever. First, there is the obvious impact of backing up more files, applications, and more total capacity as production storage volumes continue to grow at an unabated pace. Now there is the reality that organizations expect backup software to evolve and do more than just backup. Organizations want to leverage backup data and storage to create and host on-demand virtual volumes for rapid recoveries. Organizations are also expecting to be able to use those virtual volumes to deliver capabilities like reporting, testing and analytics, often called copy data management. Additionally, many backup solutions now deliver functions like deduplication, compression, and replication that were previously only available in data protection hardware. Organizations are smart to look to use the backup solution and the data it contains as more than just an insurance policy. Leveraging that data for more use cases makes cost justification of the investment easier while lowering investments in other areas of the organizations.

The problem is that most enterprise software solutions are not able to scale to meet these new, more widespread demands. They can barely keep up with data growth, let alone deliver new features at acceptable performance levels. This is primarily because most software solutions are scale-up. Some vendors claim to offer a scale-out architecture, but in reality, they are a two-tier architecture with a master server and media servers. The master server typically can’t distribute the processing of jobs, data, or the management of the backup metadata.

Scale-Out Backup Storage Doesn’t Scale Right Scale-out storage has been around for a while, but it generally has not functioned well as capacity for the backup process. The first challenge is scale-out storage solutions typically bottleneck on ingest. One or two nodes are responsible for initially receiving data, capturing metadata and then distributing the actual data across nodes in the cluster. Having one or two control nodes works fine for production data since it doesn’t have the requirement for rapidly ingesting data like data protection storage does. The one or two node system is critical if the hardware is going to provide cluster-wide deduplication. An alternative is to have a loosely coupled cluster of nodes, each of which can receive data independently, solving the bottleneck issue. The software still provides the single management point. The challenge with this approach is that it typically requires manually sending backup jobs to a specific node and then re-directing it to another node if there is a change in policy. Additionally, these loosely coupled clusters provide

media protection (RAID) on a per node basis, reducing capacity efficiency. Finally, they can’t provide clusterwide deduplication, further reducing capacity efficiency. This means that if two servers with similar data are going to the same cluster but directed at different nodes, the two nodes will have data redundancies between them.

Fixing Scale-out Protection A solution is for IT planners to look for backup solutions with a complete scale-out strategy, both software and hardware, essentially the backup software provides its own hyperconverged scale-out file system. This hyperconverged approach enables the backup software to execute functions across multiple nodes while also distributing data across those nodes. This reduces costs because storage capacity is internal to the servers and data efficiency is high since media failure protection is cluster-wide. The backup software, which now has plenty of CPU power to take on the responsibility, now handles deduplication prior to writing the data to protection storage. As a result, the hyperconverged Data Protection solution can deliver global deduplication while still having the horsepower to deliver rapid recovery and other copy data services. Both legacy scale-up architectures and scale-out architectures face big challenges from the modern enterprise data center. Legacy designs were created to primarily handle the capacity problem, but today backup is being asked to do so much more than just store copies of data. It is time for enterprises to rethink how data protection software and hardware integrate and work together to deliver a more comprehensive and truly scalable solution.

CHAPTER 10 How to Fix Broken Backup The single biggest challenge facing backup, ironically has nothing to do with backup. Backup’s biggest challenge is the pace at which the organization evolves and the speed at which it expects IT to keep pace with that evolution. Combine that with a general lack of IT personnel dedicated to the process of protecting data and the process unwinds rather quickly. In response, IT throws hardware and software band-aids at the problem leaving most organization’s data protection strategy a fragmented mess.

One of the reasons enterprise data protection solutions have lost favor to more environment specific solutions is that it just seems easier to react to a current problem and apply the most readily available band-aid. A commitment to a comprehensive strategy requires IT to look for ways to have the current solution meet the new protection demand or to look for solutions that can integrate with the comprehensive solution.

Part of the problem is the enterprise data protection solutions themselves. These solutions try to do too much such as replacing known good alternatives like storage system snapshots. Instead of creating a set of redundant features, the data center needs a different strategy, one that incorporates the foundational data protection and broad platform coverage of traditional enterprise solutions, but which can integrate and control alternative solutions. This new strategy is called Comprehensive Data Protection.

The only constant in the data center is change. Twenty years ago, the focus of data centers was on scaling client-server databases. Then attention turned to virtualization and now organizations are trying to develop hybrid cloud strategies and looking to put the right workloads in the cloud. They are dealing with out of control unstructured data growth and new database environments like Cassandra, Couchbase and others.

Developing a Comprehensive Data Protection Strategy A comprehensive data protection strategy is one where most, if not all, of the data protection process is scheduled, managed and monitored through a single solution. While the comprehensive data protection strategy more than likely provides the foundational data protection every enterprise needs, it should also integrate with other protection products like snapshots, deduplication, and replication in order to centralize all data protection routines into a single console.

The Race to New Environments

The software has to do its part to help IT maintain the comprehensive strategy. The solution needs to keep up with emerging new environments and technologies and to exploit those capabilities fully. Commvault for example integrates with multiple storage vendors in order to integrate, schedule and manage those storage vendor’s snapshot features. Additionally, Commvault has a RESTful API interface that enables cross integration and management of other stand-alone solutions.

Developing a Comprehensive Index Extra copies of data are created all the time. IT professionals count on snapshots as a form of data protection. Organizations assume that their backup application can be their archive or at least their longterm data retention system. The key for the backup solution to meet these objectives is for it to provide a powerful singular index that makes it simple to search for and find needed data. Backup, especially a comprehensive backup solution, has the ability to be a centralized knowledge base about the data that the organization holds. Commvault‘s Index is extremely powerful. Not only can it provide basic metadata about protected data like create, modify and access dates, it is also able to provide context level information about the data it protects. In addition, Commvault’s HyperScale Technology enables this robust index to scale to meet all the needs of the organization.

The three most critical elements of a comprehensive data protection strategy are the ability to scale, the ability to support new environments quickly and the ability to find the information under management. If the comprehensive data protection solution can’t provide all three of these elements, then the organization typically ends up going down a path of using environment specific solutions that have no commonality and no integration. While environment specific solutions may meet the data protection demands of the day, the fragmentation creates the potential for mistakes and inconsistencies and eventually is what leads to the opinion that backup is broken. A comprehensive data protection solution, that can deliver the three critical elements enables the organization to ensure that foundational protection is provided while integrating with external features like storage system snapshots. The eventual goal is to have a centralized repository for all data protection activity.

The Firm Storage Switzerland is the leading storage analyst firm focused on the emerging storage cataegories of memory-based storage (Flash), Big Data, virtualization, and cloud computing. The firm is widely recognized for its blogs, white papers and videos on current appraoches such as all-flash arrays, deduplication, SSD’s, software-defined storage, backup appliances and storage networking. The name “Storage Switzerland” indicates a pledge to provide neutral analysis of the storage marketplace, rather than focusing on a single vendor approach.

About Our Partner Commvault is the recognized leader in data backup and recovery. Commvault’s converged data management solution redefines what backup means for the progressive enterprise through solutions that protect, manage and use their most critical asset — their data. Commvault software, solutions and services are available from the company and through a global ecosystem of trusted partners. Commvault employs more than 2,700 highly-skilled individuals across markets worldwide, is publicly traded on NASDAQ (CVLT), and is headquartered in Tinton Falls, New Jersey in the United States. To learn more about Commvault visit www.commvault.com

The Analyst George Crump is the founder of Storage Switzerland, the leading storage analyst focused on the subjects of big data, solid state storage, virtualization, cloud computing and data protection. He is widely recognized for his articles, white papers, and videos on such current approaches as all-flash arrays, deduplication, SSDs, software-defined storage, backup appliances, and storage networking. He has over 25 years of experience designing storage solutions for data centers across the U.S.