Chapter 1 [PDF]

23 downloads 479 Views 2MB Size Report
The problem is that data is weighing down the cloud, making it less flexible and .... even drive the level of protection based on a quality of service demarcation.
[​ 1​ ​]

Table Of Contents Page 3: ​Where is the Cloud? The Cloud is a Method, Not a Location By George Crump Page 5: ​What is a Cloud First Strategy? By Curtis Preston Page 7: ​What are the Requirements of a Cloud First Strategy? By Joseph Ortiz Page 9: ​Data Centers need Open Cloud Integration By George Crump Page 11: ​Bats Aren’t Blind & IT Shouldn’t be Either By Curtis Preston Page 13: ​What is a Cloud Ready Platform? By Joseph Ortiz Page 15: ​Modernized IT Needs an Intelligent Virtual Data Repository By George Crump Page 17: ​Is your Backup Software Aware of your Data? By Curtis Preston Page 19: ​Managing your Cloud’s Lifecycle By Joseph Ortiz Page 21: ​Software Defined Deduplication is Critical to the Cloud By George Crump Page 23: ​Does Data Management make Data more Mobile? By Curtis Preston Page 25: ​Data at your Service - Bringing Self-Service to Data Management By Joseph Ortiz

[​ 2​ ​]

Chapter 1:​ Where is the Cloud? The Cloud is a Method, Not a Location by George Crump, Lead Analyst

In IT circles when the discussion of “the cloud” comes up, whether for cloud compute or cloud storage, the first thought is a public cloud like Amazon, Google or Azure. But the reality is the cloud is not a location, it is a process that allows companies to provide IT services dynamically in a self-service fashion. The provider of those services can be one of the large public cloud providers, a regional managed service provider or the organization’s own internal IT staff. What is the Cloud Method? The cloud is more than just a data center that leverages virtualization. Virtualization is an important first step to becoming cloud-like. Cloud organizations leverage virtualization to create an IT infrastructure that responds to the needs of the users and application owners. The result is an IT on-demand experience that is self-service. IT leverages automation to orchestrate the behind the scenes moves, adds and changes to make it appear to the user as if the environment is created for them instantly. Behind The Cloud As a result the cloud is really more of a description of the customer experience. The user orders the IT services they need and it appears to instantly appear. They are unaware of the orchestration of various pieces of the configuration or provisioning of the IT infrastructure in the background. Behind all of this orchestration is data. The data that application owners and users need access to. While some of this data is created as a result of the request much of the data already exists and needs to be provided to the request.

[​ 3​ ​]

Data is Weighing Down The Cloud The problem is that data is weighing down the cloud, making it less flexible and less efficient. In most data centers, part of the orchestration process is to identify the data that the requester will need and then copy that data to the services that have been provisioned. The process of copying the data takes time and obviously consumes additional capacity. Another challenge is that much of this data comes from legacy applications and then is provisioned to modern workloads. A Virtual Content Repository What IT needs is a universal repository that can connect legacy IT with modern IT. It should provide data protection to both legacy and modern IT creating the universal storage. This content repository should also provide copies of data to requesting users seamlessly with minimal actual copying of data. Finally it should connect directly into the orchestration layer so that it can be seamlessly accessed via the provisioning process.

[​ 4​ ​]

Chapter 2:​ What is a Cloud First Strategy ​by

Curtis Preston, Senior Analyst

Can a data protection company that first released its products in the pre-cloud era claim to adopt a “cloud first” strategy? The proof would be in the proverbial pudding, of course. If a company developing products for IT is putting the cloud first, then it should be relatively easy to see in their product offerings. The first way a company that makes data protection software could put the cloud first is to be able to send backups and archives to cloud storage providers. Of course, this means that it will need to be able to write via the S-3 REST API, but it’s a little more than that. Cloud storage vendors bill by the gigabyte, including how much data you send them, how much you retrieve from them and how much you actually end up storing there. Typical backup software products create a full backup followed by a series of “full file” incremental backups, which means they back up the entire file if a single byte in that file changes. Sending full backups and full-file incremental backups directly to the cloud is going to significantly increase the monthly bill. A company adopting a cloud first strategy would adjust for this by using source deduplication software that deduplicates data before sending it to the cloud. The next thing that companies interested in cloud technologies do is explore moving actual systems into the cloud. The challenge from a data protection standpoint is whether or not the backup client is VM-friendly. Most backup software products wanting to be VM-friendly programmed to the VMware and Hyper-V data protection APIs. While this is good for that type of environment, it doesn’t work when moving systems into VMs that are in the cloud. What you need at that point is client software that runs in the VM itself, but minimizes the impact to the VM. This is typically done through source deduplication software, or via a block-level-incremental backup system such as continuous data protection (CDP) or near-CDP. This again minimizes the amount of data transferring out of the VM, which minimizes both the performance impact on the VM and the amount of data that must transfer out of the VM for backup purposes. (This helps both from a physics and a billing perspective.)

[​ 5​ ​]

If a company has systems in the data center and in the cloud, the next thing it’s going to want is the ability to recover a data center system into the cloud or vice versa. It also is going to want to use that same functionality to seamless move systems between the two environments. So a vendor adopting a cloud first strategy is going to make these processes seamless. If the reader will forgive a slight detour, consider SpaceX, who has adopted a “Mars First” strategy. Yes, the company will continue to build regular spaceships that don’t go to Mars. The corporate change is in every spaceship development meeting, they ask the question, “Would it work on Mars?” If there are two ways to do something and only one of them will work on Mars, that’s the one SpaceX will choose. Conclusion Anyone can say we’re cloud first. The question is whether a company is continually putting the cloud first strategy in its development processes. As it develops new products and enhancing existing products, a company that adopts a cloud first strategy will continually ask, “Will this work in the cloud?”

[​ 6​ ​]

Chapter 3:​ What are the Requirements of a Cloud First Strategy? by Joseph Ortiz, Senior Analyst

A Cloud First Strategy means that as an organization brings new applications or services online, it tries to explore the viability of a cloud deployment prior to deploying within a more traditional architecture. Many organizations today are considering a Cloud First Strategy as the means to get out of the infrastructure business and move their full complement of business services and data into the cloud. By leveraging the economies of scale for both compute and storage now available in the cloud, organizations can significantly reduce the expense and complexity of storing and managing ever-increasing amounts of unstructured data. But before an organization can shift its virtual workloads and data to the cloud, it needs to consider the requirements of such a move. It will also need a solution that can facilitate migrating data and workloads to the cloud while maintaining control and governance over data migration, as well as meeting data security, data protection and disaster recovery (DR) requirements. Basic Requirements for Cloud First Strategy An organization considering a Cloud First Strategy needs to consider a number of factors in order to come up with a comprehensive plan to migrate applications, services and data to the cloud securely as well as insuring proper data protection strategies are also in place. Some of the more important requirements to consider are: ● Data Security – Determine data security, governance and compliance requirements. Organizations need to carefully examine the security protocols and services available from a potential cloud provider to insure they meet the organization’s security requirements. Things such as chain-of-custody, where and when to encrypt data sent to the cloud and who controls the encryption keys are very important. Encryption keys should be controlled exclusively by the organization. The cloud provider should not have access to them.

[​ 7​ ​]

● Identify and evaluate current workloads and data sets for potential migration – Organizations need to carefully examine all of their existing workloads and datasets to determine which would be good candidates for migration to the cloud. Very large organizations may find that a wholesale migration may not be viable. In such cases, they would need options that allow them to easily bridge and automate their migration based on existing workloads as they extend the scope of their operations across multiple clouds (both private and public), storage fabrics, data locations, use cases, user roles and applications. ● Examine cloud cost factors for compute, data storage and data movement – Moving operations and data to the cloud without proper expense controls can result in excessive waste of expensive resources. Organizations should keep future costs in mind when planning the move to the cloud. Things like ordering more compute power than they need or failing to program software shutdowns during off hours, failing to use monitoring tools to detect wasted computing cycles or failing to consider costs associated with moving data to different regions or back from the cloud for e-discovery or data mining purposes can result in unexpectedly high costs. ● Insure appropriate data protection and DR services are in place – While data stored in the cloud is protected to a degree with object storage erasure coding and cloud provider backups of their environment, it does not insure that you can meet specific Recovery Point Objective / Recovery Time Objective (RPO / RTO) requirements in the event of some type of hardware failure in the cloud provider’s environment or in the case of someone deliberately destroying data. ● Organizations should also make sure that data protection and DR strategy provides for copies of their data to be stored off-site and preferably offline. Simply replicating copies of the data and backups to other cloud provider locations may technically be considered off-site but that is not the same as offline. Data that is on-line, regardless of location, is still susceptible to hacking attacks. Managing the Migration Process and Data A cloud migration project is a complex process that requires a comprehensive solution or set of tools that enables an organization to control all aspects of a data migration project, as well as the cost effective management and protection of all data regardless of its location. An ideal solution would also provide for auditable compliance, legal and business analytics, control and governance of the data migration process as well as data protection and DR requirements. Additionally, it should also provide robust search capabilities along with the means for other applications to seamlessly access the data it protects and manages without compromising security in any way. Such a solution would significantly enhance ROI and an organization’s ability to securely and cost effectively manage its data.

[​ 8​ ​]

Chapter 4:​ Data Centers need Open Cloud Integration ​by George Crump, Lead Analyst

Organizations establishing a cloud first strategy are looking for ways to integrate both legacy and modern applications. They are also looking for ways to automate and orchestrate redundant tasks. The goal is self-service IT, where users order the capabilities they need and the data center automatically adjusts to deliver on these requests. What is Cloud Integration? Cloud integration can come in several forms. The first is creating extensions to legacy applications so they can benefit from the cloud. But IT has to be careful with how vendors define integration. For some vendors “integration” is the sale of an additional appliance that handles the conversion between traditional in-data center protocols and cloud protocols like object storage and REST APIs. The reality is that it makes no change to the legacy application. Instead of making processes more efficient, they actually incur additional overhead while waiting for the appliance to convert to and from the cloud. The other problem is the appliance is another device the organization needs to implement, learn, manage and maintain. Cloud integration is, or at least should be, the improvement in a vendor’s software code that enables it to directly support the cloud. In terms of storage, integration could mean modifying the software so that it natively supports REST API and object-based storage. Once making the connection to cloud storage the cloud can become a target for old data, old backups or DR copies of backups. What is Cloud Automation? The level of automation and orchestration is the key differentiator between a heavily virtualized environment and a cloud. The higher the automation, the more self-service the environment becomes. Automation can be a capability built into the software and available exclusively to that application or a publicly available API can program it externally. In many cases this is the preferred method since it allows a single automation to manage a wide variety of orchestration tools.

[​ 9​ ​]

Extending Data Protection Through The Cloud and Automation You can find an example of the impact of a developer investing the time to integrate cloud interfaces and automation techniques in data protection. Every enterprise has data protection for their legacy application environment, and they are beginning to realize that they need it for their modernized IT environment as well. Organizations also need the ability to move data between legacy and modern IT quickly and seamlessly. A data protection application that is truly cloud integrated can leverage cloud infrastructure to execute and store backup data. It can also leverage cloud interfaces to quickly make copies of data that modern applications need, as soon as they need it. Leveraging the data protection process to feed modern applications should greatly reduce not only overall capacity costs, but also lower the on-premises investment that organizations make in data protection hardware. Lastly, if that application could enable external access by cloud automation and orchestration tools, then the data protection solution could participate in the self-service model. It can deliver data the self-service portal as a component of a provisioning request. Or it can apply protection to an application requesting protection. The user can even drive the level of protection based on a quality of service demarcation. Conclusion Applications that support the enterprise, like data protection, still have a vital role to play as IT modernizes itself. But to be of value, they have to not just ​connect to the cloud but integrate with it. Native communication via cloud friendly protocols is critical. In addition the software must also be able to be externally automated by a variety of orchestration tools. The simplest way to do this is to make the solution externally accessible via a REST API. The combination of a cloud ready and cloud automated solution allows what was once considered a required evil of the legacy data centers to be a vital element that not only fulfills its original value but extends its usefulness.

[​ 10​ ​]

Chapter 5:​ Bats Aren’t Blind & IT Shouldn’t be Either by Curtis Preston, Senior Analyst

Bats aren’t blind. They just can’t see. The term “blind as a bat” could just as well be “blind as a person driving in the dark without headlights.” The reason bats are “blind” is that they hunt at night and no creature can see in complete darkness. They’re not blind; they can’t see because it’s dark. That’s why they use something we call echolocation, or bisonar, to find the object of the hunt: insects. They also use it to locate each other so they don’t run into each other. Data management software that often runs at night needs to find what it’s looking for, do what it came to do, and the different processes need to not run into each other while they’re doing their job. And when everything is over, the IT operations team needs to know what happened. That can only happen with data orchestration, control, and automation. The reason why this is important is that there are many ways to accomplish the many tasks that one must perform when managing a modern IT organization that includes virtualized infrastructure, private cloud equipment and public cloud services. The most common way to handle many of these tasks is to use the native capabilities of each platform or software or services written just for that platform. But there’s always more than one way to do it. More specifically, there is almost always a cheaper way to do it. For example, instead of buying backup software that fully integrates with vSphere, you could just write your own scripts to create VMware snapshots at the appropriate time, which would create VSS snapshots inside your Windows VMs. This would give you a stable, while momentary, point-in-time snapshot you can backup via whatever mechanism you prefer. Perhaps you will run another script to create a snapshot on the storage array where the VMs reside.

[​ 11​ ​]

The same is true of services running in the cloud. Most public cloud vendors offer some method of backing up their data. It may only back it up to their system, and it may be a perfectly fine way to back it up. The question is, how do you manage those backups and your onsite backups of vSphere at the same time? Do you learn and manage two completely different products? Commercial orchestration and automation software is designed to handle myriad problems from failed snapshots to down VMs or mismatched credentials. ​When these things happen, the orchestration software has a predefined escalation system that it can follow to notify those that can fix the problem. When your management system consists of 100 scripts, how many different places will you need to change things when something as simple as a credential change happens? What about being able to handle multiple levels of escalation? The other challenge when doing things without commercial orchestration software is what happens when parts of the infrastructure are upgraded. How much effort will your scripts require when you need to upgrade to the latest version of Oracle, vSphere, Hyper-V or SQL Server? The more things you script for, the more scripts you will need to maintain. This is why “free” is never free. Finally, providing centralized reporting of what happens when dozens to hundreds of scripts run is near impossible without significantly advanced coding. In contrast, a typical commercial solution includes this as basic functionality. Conclusion IT infrastructure teams should spend their time figuring out how to make things better, not how to make ​things. This blog post has touched on a few of the reasons why commercial data orchestration software is a good idea. Consider using such products wherever possible, and move on to tackling other problems. That’ll make sure your various IT tools — both onsite and in the cloud — get what they need and don’t run into each other in the middle of the night like a bunch of blind bats.

[​ 12​ ​]

Chapter 6:​ What is a Cloud Ready Platform? by Joseph Ortiz, Lead Analyst

Once an organization has carefully considered all the various factors, requirements and ramifications of a cloud first strategy and has decided to implement it, they need the means to accomplish the migration of their selected data, applications and processes to the cloud in the safest, most efficient manner possible. In order to accomplish this, organizations need a solid cloud ready platform, meaning it can offer a lifecycle oriented, virtual repository which is able to span on-premises data centers, remote sites, and cloud hosted Infrastructure as a Service (IaaS) under a single operations and management solution. As these platforms emerge IT Professionals will be faced with turnkey platforms that include hardware and software, and software only platforms that enable the organization to leverage its existing commodity hardware. The normal pros/cons of turnkey versus software only apply to these platforms decisions but a software only approach seems to fit the modern IT mantra more so than a turnkey platform. While many vendor solutions offer open cloud integration interfaces to cloud based data, application compute or orchestrated tasks, these are only the first steps in transitioning to the cloud. Additional capabilities are needed in order to efficiently and securely transition data, operations and applications to the cloud in a way that enables the enterprise to extend to the cloud quickly while achieving fast ROIs on new cloud powered solutions. Addressing the Challenges Some of the more critical tasks a cloud-ready platform should be able to handle are: ● Using a business aligned data lifecycle policy to manage indexed physical and virtual data instances or versions across snapshot, replication, backup and archive copies. ● Providing a securely encrypted and efficiently deduplicated, direct access extension to cloud storage.

[​ 13​ ​]

● Ability to call independent sets of cloud provisioning services as part of an orchestration policy, which allows a simpler publication of complete service offers that can create new workloads, monitor and take action, roll out and update packages, protect and secure selective data, share and audit usage, and identify waste as well as automatically retire unused data to avoid escalating usage. ● Creating a lifecycle oriented virtual repository that can span everything from on-premises data centers and remote sites, to cloud-hosted IaaS under a single management and operations solution, which allows organizations to discover and manage data, workloads, indexing and search functions, and flow across different environments based on policy, usage and need. It also can handle the challenges of automating orchestrated data movement, workload creation, lifecycle management, operations and secure sharing within or across managed environments. ● Spanning disparate environments from data centers to mobile devices, while providing extended security services such as automatic locking, encrypting, and erasing managed end-point devices, as well as controlling mobile devices data access in order to prevent data loss. ● Content awareness, meaning it has the ability to assess and categorize data as content, which allow it to provide retention methods that, based on rules defined by the organization, can selectively control data movement and retain data in specific geographic locations to comply with data sovereignty laws and regulations as well as affecting compliance holds on managed data sets for eDiscovery and other purposes. A viable cloud ready platform should not only provide the above listed critical capabilities, it should also be able to provide workload portability across common environments such as Amazon Web Services, Microsoft Azure, VMware, and Hyper-V, by performing all the necessary conversions to transform the data to fit the destination workload. Cloud Ready Platform Advantage With these features and the ability to virtualize their data sets, organizations would be able to consolidate all of their disparate data silos and efficiently and securely migrate their data, applications and processes, to the cloud. Additionally they would be able to automatically control data movement, based on policy, to ensure that as data ages, it is stored on the most cost effective tier both on premises and in the cloud. This would enable an organization to also control its cloud storage costs. This type of solution would significantly simplify data management while also enhancing an organization’s ROI and its ability to manage their data securely and cost effectively.

[​ 14​ ​]

Chapter 7:​ Modernized IT Needs an Intelligent Virtual Data Repository by George Crump, Lead Analyst

The cloud is not a single location. It is a combination of private, public and hybrid cloud locations and an organization may have several of each. The cloud aspect means that data and workloads should move seamlessly between these locations. The problem is that each location is not the same and the data may need some kind of transformation so it can operate in a different location. In addition there will be times where there should be a prohibition for certain data sets to move to a particular location. Data sovereignty is a good example. Organizations need an intelligent virtual repository that will enable data to not only move to different cloud locations but also transform data so that it will work in the location it needs to move to. Centralized But Not Primary The virtualized storage repository can not be primary storage. Primary storage needs to be static and close to the running workload or user. It should also not have the responsibility managing the distribution of that data to other points within the cloud infrastructure. The virtual repository should be a combination of secondary storage and software so it can store data cost effectively, versioned, indexed and managed. A challenge, then, is moving data from production storage into the virtual repository. Obviously this can’t be a manual process and the organization will resist, with good reason, the addition of yet another product that moves or copies data. A modernized data protection application could be the ideal answer. It has to frequently protect, production data anyway. Extending that software to create a virtual storage repository is an ideal solution to the problem. More Than Just a Storage Tank The virtualized repository also has to be more than a gigantic storage landfill. It has to be able to protect, secure, organize and manage data while it is in the repository and during its removal. These requirements mean that the solution needs to understand what it is storing through both the use of metadata tags as well as a contextual investigation. The repository can leverage this information to make sure that data is distributed correctly based on need and data sovereignty.

[​ 15​ ​]

The Virtual Data Repository could also use the contextual information to understand how a workload needs to transform. For example if it understands a VMware image that it stores will go to Amazon ECS, the software managing the repository should transform the image to make sure it works in its new location. Conclusion A CloudFirst strategy is a new objective for organizations. More than likely the organizations will have legacy physical and virtual environments they will need to maintain and ultimately transform into a more cloud ready environment. If fact many of these organizations may have even had a VirtualizeFirst strategy before adopting a CloudFirst strategy. But as we discussed in other entries, a CloudFirst strategy differs from a VirtualizeFirst strategy both in terms of location and the level of automation. To help make the transition to a CloudFirst strategy organizations need to transform existing workloads into cloud-ready workloads. They need a virtual data repository that is more than just a storage dumping ground. That repository needs to be intelligent so it can understand the data it stores. Armed with that information it can, when the use case arises, automatically manage and transform data.

[​ 16​ ​]

Chapter 8:​ Is your Backup Software Aware of your Data? by Curtis Preston, Senior Analyst

Twenty years ago backup applications had absolutely no knowledge of the data they were backing up. Those responsible for backing up applications such as Oracle, Informix, or Sybase were told to either shut down those applications prior to running a backup, or they were told to run a database dump to a file system which would then be backed up by the backup application. Eventually, backup applications began to be more aware of such applications and began to develop ways to manage those backups from within the backup application itself. But one aspect of data awareness still escapes most backup providers, and that is the content of the files or databases they are backing up – the actual information. For example, most backup applications are able to restore a Microsoft Word document based on its directory and its name. Ask those same backup applications to restore all files with the word ​software in them, and they will draw a blank. They know the names of the files and they know the locations of the files but they know very little about the contents of the files. The same is true of database data. You can restore your Exchange server to yesterday. You might be able to restore a given user’s account to yesterday. You might even be able to search that user’s account for all emails with the word ​software in them and restore those. But ask your backup software to restore all emails from ​all accounts that have the word ​software in them, or ask it to restore all emails from a given user over a span of time — rather than a single time frame. You will get the same response – a blank stare. Your backup software vendor may say that this functionality is typically that found in an archive product – and they would be right. But in today’s cloud friendly world, knowing the exact location of a file or the source or destination of a given email is increasingly hard. This means backup software is being requested to perform functions archive software would typically provide. In order to be able to satisfy these requests, the backup application must be more aware of the data it is backing up.

[​ 17​ ​]

Conclusion The proliferation of data across multiple platforms – including cloud platforms – will increase the chances that a given file or piece of data will become lost. A data-aware storage platform could be very helpful here, but so could a data aware backup platform. Being able to restore files based on their content might really be helpful to a company who can’t seem to find their content.

[​ 18​ ​]

Chapter 9:​ Managing your Cloud’s Lifecycle by Joseph Ortiz, Senior Analyst

One of the most important, recent shifts in enterprise IT operations is the migration of not only ever-increasing amounts of data to the cloud, but also the full complement of business services as IT begins to implement a cloud first strategy. This allows an operation to take advantage of the cloud’s ability to offer attractive pricing for unlimited storage and high performance computing resources on a pay-for-what-you-use basis. VM Sprawl in the Cloud One of the key components of a cloud first architecture is virtualization. One of the by-products of virtualization is a virtual machine (VM). Unlike a physical server, IT can create VMs very quickly and easily. But just like a physical machine, a VM also consumes memory, compute cycles, storage space, and needs protection during its lifespan. The ease of creation leads to the rapid proliferation of VMs in the cloud and in the data center. But VM sprawl in the cloud can end up being just as expensive, if not more so, than if VMs are on-premises. This is because you have to rent all the resources in the cloud. A VM that is no longer being used and just sitting idle is still using compute cycles, memory, and taking up expensive, primary storage space that runs up a customer’s bill. If the VM is also being backed up on a regular basis, this will consume still more storage to hold multiple copies of the same image file. When you multiply all these factors by hundreds or even thousands of VMs, the costs can add up fast. Just like data, organizations need a means to manage the lifecycle of VMs from their creation, through their use and operation and, ultimately, to their archiving and retirement. This allows IT to contain costs and maximize resources be they in a on-premises private cloud or an external cloud provider.

[​ 19​ ​]

Avoiding VM Sprawl in the Cloud While there are numerous tools that allow admins to check and monitor VMs to determine which are idle or in use, this is a very time consuming manual process. So is the process of shutting down and retiring VMs that are past their usefulness, then migrating their images to less expensive storage for future recall if necessary or for archiving. There is no real clear process to accomplish these very time consuming manual tasks. What IT needs is a comprehensive, lifecycle oriented and policy-driven solution using customer-defined management policies that will automate the process of managing the lifecycle of VMs from creation to eventual retirement. It should be able to span disparate environments, automatically detect idle VMs and shut them down until IT needs them again – while also migrating their images to less expensive storage. The solution should also be able to pool and reduce redundancy across the collection of stored images with deduplication. It should also provide a customizable and comprehensive graphical management interface that simplifies the creation and implementation of user defined policies as well as managing the entire virtual environment from a single, easy to use console. Conclusion A robust, comprehensive solution having these capabilities can enable an organization to avoid VM sprawl and contain cloud costs while helping ensure full utilization of the organization’s cloud resources.

[​ 20​ ​]

Chapter 10:​ Software Defined Deduplication is Critical to the Cloud by George Crump, Lead Analyst

The goal of any cloud initiative is to create a cost-effective, flexible environment. The architectures will typically store large data sets for long periods of times, so one of the challenges to being cost-effective is the physical cost of storage. Deduplication is critical to extracting maximum value from a cloud first initiative but the cloud requires a different, more flexible software defined implementation. Why We Still Need Deduplication? While the cost per GB of hard disk and even flash storage continues to plummet, when purchased in the quantities needed to meet the typical cloud architectures capacity demands, storage continues to be the most expensive aspect of the design. And it’s not just the per GB cost, it is the physical space that each additional storage node consumes. Too many nodes can force the construction of a new data center, which is a much bigger cost concern than the price per GB of storage. Deduplication provides a return on the investment by making sure that the architecture stores only the unique data. That not only reduces the capacity requirement it also reduces the physical storage footprint. Organizations will have different cloud strategies. A few may only use the public cloud. Some may only use private cloud architectures. Most, however, will take a hybrid approach, leveraging the public cloud when it makes sense and a private cloud when performance or data retention concerns force them to. In the hybrid model data should flow seamlessly and frequently between public and private architectures. If the same deduplication technology is implemented in both the hybrid and public cloud architectures then the technology’s understanding of data can be leveraged to limit the amount of data that has to be transferred, making the network connection between the two more efficient because only unique data segments would need to be transferred.

[​ 21​ ​]

Why We Need Software Defined Deduplication The other aspect of a cloud initiative is flexibility so IT can more quickly respond to any issues. Part of that flexibility is defined in the hybrid model itself. The storage architecture is split into two parts. The public cloud owns a section of it and a private cloud owns the rest. While the public cloud has the advantage of low upfront costs, IT can not specify what types of storage hardware, if any, it uses. The public cloud’s consumer-only model requires all storage services will be available as software only. This includes deduplication, hence it has to be available as a software-defined component of the overall data management solution. Software defined deduplication allows the data management software to execute and manage the data efficiency process, which should allow it to use anyone’s hardware. Most private cloud solutions will leverage an object storage system as part of the architecture. It may or may not come with its own data deduplication feature but it is unlikely to include a robust data management engine. Implementing a data management software solution that includes a deduplication capability on top of the object storage system provides more flexibility. The organization is free to select any storage hardware. And because it is software, IT can implement the same data efficiency in the cloud, redundant data between private and public cloud does not need to be re-transmitted, improving network efficiency. Conclusion Storage costs may eventually get low enough that deduplication is, well, redundant. But that day is not any time soon. In addition, even if storage costs drop to that point deduplication will become obsolete. The greater density that a deduplicated storage node will achieve should reduce the physical footprint of the cloud storage cluster. A hybrid cloud model will also benefit from the network savings obtained by not transferring redundant data. Most critical though is that the technology be software defined so that it can provide the functionality regardless of hardware or location.

[​ 22​ ​]

Chapter 11:​ Does Data Management make Data more Mobile? by Curtis Preston, Senior Analyst

Proper data management can indeed make data more mobile. Put another way, it can appear to make data more mobile by making it appear to be in multiple places at the same time without consuming capacity. One of the key elements of data management is managing all of the instances of a given object or file. This blog post will focus on two types of datasets: the primary instance and all secondary instances – also known as copies. The primary or production dataset is one that supports a direct business function. It may support customer orders, sales, accounting, or development. A secondary dataset is an unmodified copy of the production dataset. If it is modified, then chances are it was done to support some business reason and therefore it moves back into the primary dataset. Examples of secondary copies would include development and test systems that are using a copy of production data in order to simulate production. Since it is a copy of the production data, it does not need backup. It also does not need to be stored on tier 1 storage. In fact, it might not need to exist onsite at all – it might be just fine out up in the cloud. Some add to this recommendation that if a copy has been modified it should be backed up. However, if a copy has been modified it is no longer a copy; it is it’s own entity and should receive standard data protection logic. But if something is truly just a copy of something else, it does not need backup. In addition to secondary copies, there is also the concept of secondary access. That is, users accessing primary data through some other mechanism in order to support the business. A perfect example of this is an internal user pulling up a copy of an important spreadsheet on their iPad in order to answer a question in a business meeting. This user does not need another copy of the data; he or she only needs access to the data. A complete data management system that starts with protecting the primary copy can satisfy all of these requirements within a single system.

[​ 23​ ​]

Development and test systems can easily be given a copy of the primary data by doing a test restore to an alternate location – including the cloud. Making the data cloud accessible also supports secondary access. As long as the data management system is able to authenticate users via other mechanisms, it can give those users access to all kinds of data on a number of devices including tablets, phones, or other cloud-enabled systems. Conclusion Data management can indeed make data more mobile. It can make sure that primary copies are placed in additional locations (e.g. the cloud) any time a user needs them. In addition to making the data ​more mobile, it can make it appear to be even more mobile by making it accessible from multiple locations. It all starts, however, with protecting and managing the data through a single data management framework – and not treating copies as a completely separate idea.

[​ 24​ ​]

Chapter 12:​ Data at your Service - Bringing Self-Service to Data Management by Joseph Ortiz, Senior Analyst

The business environment changed radically over the last decade. Long gone are the standard 9 to 5 working days on a Monday through Friday business model. Today, businesses are operating 24/7 365 days a year. Operations are moving to be more cloud-like. Never has the old adage of “time is money” been more applicable than it is today. In this new business environment, organizations need to be able to quickly respond to sudden, unexpected economic changes and business challenges. To accomplish this, the organization needs to be able to rapidly move any data, and the resources to process it, to any point in the enterprise at any time. The Help Desk is Obsolete Businesses typically depend on IT and Help Desk personnel to not only provide assistance with technical problems but to also implement and deploy new servers, applications, Virtual Machines (VM), network storage, infrastructure changes and copy or move data to wherever the organization needs it. Employees submit requests to the help desk for a service or resource. The help desk personnel then generate the necessary work orders to the appropriate IT personnel or groups, which implements the required changes and deploy the necessary resources. The problem with this method, however, is it is time consuming and, depending on the number of pending requests, can take a long time to deploy or implement. In the meantime, part of the business is waiting on IT to handle its requests so it can move ahead with a given project or meeting a specific business challenge. That wasted time costs money and is no longer acceptable. Meeting Business’s Speed and Agility Demands To improve IT’s ability to respond quicker, organizations are attempting to become more cloud like. This includes not only extending into the cloud, but also learning from cloud providers to create an architecture that is self-service, automated and orchestrated.

[​ 25​ ​]

Businesses need the ability to provision and manage multi-vendor and multi-cloud infrastructure, applications and resources using user-defined policies, on a self-service basis 24/7, without having to wait for IT personnel to implement or deploy them. Instead, a viable solution would allow users to select from a catalog of different service packages that would automatically handle all the processes to provision a particular workload, as well as implementing protection policies, lifecycle management policies, orchestrating data movement. It would also include the ability to automatically power down, migrate, retire or fully archive those workloads from the production cloud. These capabilities would enable businesses to rapidly provision new workloads while still controlling costs by limiting VM sprawl and ensuring data is stored on the appropriate storage tiers and properly managed at each stage of its lifecycle. Conclusion Organizations today need to be able to respond quickly to the rapid changes in the marketplace and the challenges they face from their competitors. They also need to leverage seamlessly the same efficiency and scale benefits they enjoy in their data centers in their transition to the cloud. The ability to rapidly provision workloads and position data wherever the organization needs it boosts its ability to deliver faster outcomes on dev-ops programs that are essential to the organization’s success. A comprehensive policy driven solution that provides orchestration layers to automate user-defined workload policies, while delivering new self-service offers that can easily plug into existing management frameworks and also provide comprehensive protection and lifecycle management, can give the organization the ability to respond very quickly to new business challenges as they occur. This ability can provide the organization with a definite competitive edge.

[​ 26​ ​]

About Storage Switzerland Storage Switzerland is an analyst firm focused on the storage, virtualization and cloud marketplaces. Our goal is to educate IT Professionals on the various technologies and techniques available to help their applications scale further, perform better and be better protected. The results of this research can be found in the articles, videos, webinars, product analysis and case studies on our website​ storageswiss.com

George Crump, Chief Steward George Crump is President and Founder of Storage Switzerland. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland he was CTO at one the nation’s largest storage integrators where he was in charge of technology testing, integration and product selection.

Curtis Preston, Lead Analyst W. Curtis Preston (aka Mr. Backup) is an expert in backup & recovery systems; a space he has been working in since 1993. He has written three books on the subject, Backup & Recovery, Using SANs and NAS, and Unix Backup & Recovery. Mr. Preston is a writer and has spoken at hundreds of seminars and conferences around the world. Preston’s mission is to arm today’s IT managers with truly unbiased information about today’s storage industry and its products.

Joseph Ortiz, Lead Analyst Joseph is an Analyst with Storage Switzerland and an IT veteran with over 35 years of experience in the high tech industries. He has held senior technical positions with several OEMs and VARs; providing technical pre and post sales support as well as designing, implementing and supporting backup, recovery and data protection / encryption solutions along with providing Disaster Recovery planning and testing and data loss risk assessment in distributed computing environments on Unix and Windows platforms. Copyright © 2017 Storage Switzerland, inc.—All rights reserved

[​ 27​ ​]

Sponsored by Commvault

About Commvault Commvault is a leading provider of data protection and information management solutions, helping companies worldwide activate their data to drive more value and business insight and to transform modern data environments. With solutions and services delivered directly and through a worldwide network of partners and service providers, Commvault solutions comprise one of the industry’s leading portfolios in data protection and recovery, cloud, virtualization, archive, file sync and share. Commvault has earned accolades from customers and third party influencers for its technology vision, innovation, and execution as an independent and trusted expert. Without the distraction of a hardware business or other business agenda, Commvault’s sole focus on data management has led to adoption by companies of all sizes, in all industries, and for solutions deployed on premise, across mobile platforms, to and from the cloud, and provided as-a-service. Commvault employs more than 2,000 highly skilled individuals across markets worldwide, is publicly traded on NASDAQ (CVLT), and is headquartered in Tinton Falls, New Jersey in the United States. To learn more about Commvault — and how it can help make your data work for you — visit commvault.com.

[​ 28​ ​]