USING VMWARE VSPHERE WITH EMC VPLEX

15 downloads 173782 Views 7MB Size Report
used for creating a VMware file system (datastore), or used as a RDM. However, for ... VMware file system or a RDM) are aligned to a multiple of 64 KB.
White Paper

USING VMWARE VSPHERE WITH EMC VPLEX Best Practices Planning

Abstract This white paper describes EMC® VPLEX™ features and functionality relevant to VMware® vSphere. The best practices for configuring a VMware environment to optimally leverage EMC VPLEX are also presented. The paper also discusses methodologies to migrate an existing VMware deployment to the EMC VPLEX family. July 2011

Copyright © 2011 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. VMware, ESX, ESXi, vCenter, VMotion, and vSphere are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other trademarks used herein are the property of their respective owners. Part Number h7118.3

Using VMware vSphere with EMC VPLEX

2

Table of Contents Executive summary.................................................................................................. 4 Audience ............................................................................................................................ 4

EMC VPLEX overview ................................................................................................ 5 EMC VPLEX architecture ...................................................................................................... 5 EMC VPLEX family ............................................................................................................... 6 EMC VPLEX clustering architecture...................................................................................... 7

Provisioning VPLEX storage to VMware environments ............................................... 9 EMC Virtual Storage Integrator and VPLEX .............................................................. 20 Connectivity considerations ................................................................................... 22 Multipathing and load balancing ........................................................................... 25 VMware ESX version 4.x and NMP..................................................................................... 25 VMware ESX version 4.x with PowerPath/VE ..................................................................... 27 PowerPath/VE features ................................................................................................. 28 PowerPath/VE management ......................................................................................... 29 Path Management feature of Virtual Storage Integrator..................................................... 29

Migrating existing VMware environments to VPLEX ................................................. 33 Nondisruptive migrations using storage vMotion .............................................................. 33 Migration using encapsulation of existing devices............................................................ 35

VMware deployments in a VPLEX Metro environment .............................................. 41 VPLEX witness .................................................................................................................. 42 VMware cluster configuration with VPLEX witness ............................................................ 44 VMware cluster configuration ....................................................................................... 44 VMware DRS Groups and Rules .................................................................................... 45 Cross-connecting VMware vSphere environments to VPLEX clusters for increased resilience ......................................................................................................................... 57 VMware cluster configuration without VPLEX witness ....................................................... 65 Nondisruptive migration of virtual machines using VMotion in environments without VPLEX witness ............................................................................................................................ 71 Changing configuration of non-replicated VPLEX Metro volumes....................................... 73 Virtualized vCenter Server on VPLEX Metro ....................................................................... 77

Conclusion ............................................................................................................ 79 References ............................................................................................................ 79

Using VMware vSphere with EMC VPLEX

3

Executive summary The EMC® VPLEX™ family of products running the EMC GeoSynchrony™ operating system provides an extensive offering of new features and functionality for the era of cloud computing. EMC VPLEX breaks the physical barriers of data centers and allows users to access a single copy of the data at different geographical locations concurrently, enabling a transparent migration of running virtual machines between data centers. This capability allows for transparent load sharing between multiple sites while providing the flexibility of migrating workloads between sites in anticipation of planned events. Furthermore, in case of an unplanned event that causes disruption of services at one of the data centers, the failed services can be restarted at the surviving site with minimal effort thus minimizing recovery time objective (RTO). VMware® vSphere™ virtualizes the entire IT infrastructure including servers, storage, and networks. The VMware software aggregates these resources and presents a uniform set of elements in the virtual environment. Thus VMware vSphere™ 4 brings the power of cloud computing to the data center, reducing IT costs while also increasing infrastructure efficacy. Furthermore, for hosting service providers, VMware vSphere 4 enables a more economic and efficient path to delivering cloud services that are compatible with customers’ internal cloud infrastructures. VMware vSphere 4 delivers significant performance and scalability to enable even the most resourceintensive applications, such as large databases, to be deployed on internal clouds. With these performance and scalability improvements, VMware vSphere 4 can enable a 100 percent virtualized internal cloud. The EMC VPLEX family is thus a natural fit for a virtualization environment based on VMware technologies. The capability of EMC VPLEX to provide both local and distributed federation that allows transparent cooperation of physical data elements within a single site or two geographically separated sites allows IT administrators to break physical barriers and expand their VMware-based cloud offering. The local federation capabilities of the EMC VPLEX allow collection of the heterogeneous data storage solutions at a physical site and present the storage as a pool of resources for VMware vSphere, thus enabling the major tenets of a cloud offering. Specifically, an extension of the VPLEX’s capabilities to span multiple data centers enables IT administrators to leverage either private or public cloud offerings from hosting service providers. The synergies provided by a VMware virtualization offering connected to EMC VPLEX thus help customers to reduce total cost of ownership while providing a dynamic service that can rapidly respond to the changing needs of their business.

Audience This white paper is intended for VMware administrators, storage administrators, and IT architects responsible for architecting, creating, managing, and using virtualized IT environments that utilize VMware vSphere and EMC VPLEX technologies. The white

Using VMware vSphere with EMC VPLEX

4

paper assumes the reader is familiar with VMware technology, EMC VPLEX, and related software.

EMC VPLEX overview The EMC VPLEX family with the EMC GeoSynchrony operating system is a SAN-based federation solution that removes physical barriers within a single and multiple virtualized data centers. EMC VPLEX is the first platform in the world that delivers both local and distributed federation. Local federation provides the transparent cooperation of physical storage elements within a site while distributed federation extends the concept between two locations across distance. The distributed federation is enabled by a breakthrough technology available with VPLEX, AccessAnywhere™, which enables a single copy of data to be shared, accessed, and relocated over distance. The combination of a virtualized data center with the EMC VPLEX offering provides customers entirely new ways to solve IT problems and introduce new models of computing. Specifically, customers can: •

Move virtualized applications across data centers



Enable workload balancing and relocation across sites



Aggregate data centers and deliver “24 x forever”

EMC VPLEX architecture EMC VPLEX represents the next-generation architecture for data mobility and information access. The new architecture is based on EMC’s more than 20 years of expertise in designing, implementing, and perfecting enterprise-class intelligent cache and distributed data protection solutions. As shown in Figure 1, VPLEX is a solution for federating both EMC and non-EMC storage. VPLEX resides between the servers and heterogeneous storage assets and introduces a new architecture with unique characteristics: •

Scale-out clustering hardware that lets customers to start small and grow big with predictable service levels



Advanced data caching utilizing large-scale SDRAM cache to improve performance and reduce I/O latency and array contention



Distributed cache coherence for automatic sharing, balancing, and failover of I/O across the cluster



A consistent view of one or more LUNs across VPLEX Clusters separated either by a few feet within a data center or across asynchronous distances, enabling new models of high availability and workload relocation

Using VMware vSphere with EMC VPLEX

5

Figure 1. Capability of EMC VPLEX to federate heterogeneous storage

EMC VPLEX family The EMC VPLEX family consists of three offerings: •

VPLEX Local: This solution is appropriate for customers that would like federation of homogeneous or heterogeneous storage systems within a data center and for managing data mobility between physical data storage entities.



VPLEX Metro: The solution is for customers that require concurrent access and data mobility across two locations separated by synchronous distances. The VPLEX Metro offering also includes the unique capability where a remote VPLEX Metro site can present LUNs without the need for physical storage for those LUNs at the remote site.



VPLEX Geo: The solution is for customers that require concurrent access and data mobility across two locations separated by asynchronous distances. The VPLEX Geo offering is currently not supported for live migration of VMware vSphere virtual machines using VMware VMotion®.

The EMC VPLEX family of offerings is shown in Figure 2.

Using VMware vSphere with EMC VPLEX

6

Figure 2. EMC VPLEX family offerings

EMC VPLEX clustering architecture VPLEX uses a unique clustering architecture to help customers break the boundaries of the data center and allow servers at multiple data centers to have concurrent read and write access to shared block storage devices. A VPLEX Cluster, shown in Figure 3, can scale up through the addition of more engines, and scale out by connecting multiple clusters to form a VPLEX Metro configuration. A VPLEX Metro supports up to two clusters, which can be in the same data center or at two different sites within synchronous distances (less than 5 ms round trip time). VPLEX Metro configurations help users to transparently move and share workloads, consolidate data centers, and optimize resource utilization across data centers. In addition, VPLEX Clusters provide nondisruptive data mobility, heterogeneous storage management, and improved application availability.

Using VMware vSphere with EMC VPLEX

7

Site A

Site B

App OS

App OS

App OS

App OS

App OS

App OS

App OS

App OS

App OS

App OS

App OS

App OS

App OS

App OS

App OS

App OS

VMware ESX

VMware ESX

VMware ESX

VMware ESX

FC SAN

FC SAN

Distributed Volume

VPLEX

FC

Federation Layer FC SAN

FC SAN

B

A

Figure 3. Schematic representation of EMC VPLEX Metro A VPLEX Cluster is composed of one, two, or four engines. The engine is responsible for federating the I/O stream, and connects to hosts and storage using Fibre Channel connections as the data transport. A single VPLEX Cluster consists of an engine with the following major components: •

Two directors, which run the GeoSynchrony software and connect to storage, hosts, and other directors in the cluster with Fibre Channel and gigabit Ethernet connections



One Standby Power Supply, which provides backup power to sustain the engine through transient power loss



Two management modules, which contain interfaces for remote management of a VPLEX Engine

Each cluster also consists of:

Using VMware vSphere with EMC VPLEX

8



A management server, which manages the cluster and provides an interface from a remote management station



An EMC standard 40U cabinet to hold all of the equipment of the cluster

Additionally, clusters containing more than one engine also have: •

A pair of Fibre Channel switches used for inter-director communication between various engines



A pair of Universal Power Supplies that provide backup power for the Fibre Channel switches and allow the system to ride through transient power loss

Provisioning VPLEX storage to VMware environments EMC VPLEX provides an intuitive, wizard-driven management interface to provision storage to various operating systems, including VMware vSphere. The wizard has both an EZ-Provisioning tab and an Advanced tab. The system also provides a command line interface (CLI) for advanced users. Figure 4 shows the GUI interface for provisioning storage from EMC VPLEX.

Using VMware vSphere with EMC VPLEX

9

Figure 4. EMC VPLEX GUI management interface – EZ-Provisioning tab

Using VMware vSphere with EMC VPLEX

10

Figure 5. Provisioning Illustration The browser-based management interface, enlarged in Figure 5, schematically shows the various components involved in the process. Storage from EMC VPLEX is exposed using a logical construct called “Storage View” that is a union of the objects: “Registered initiators”, “VPLEX ports,” and “Virtual Volume”. The “Registered initiators” object lists the WWPN of the initiators that need access to the storage. In the case of a VMware environment, the “Registered initiators” entity contains the WWPN of the HBAs in the VMware ESX® hosts connected to the EMC VPLEX. The object “VPLEX ports” contains the front-end ports of the VPLEX array through which the “Registered initiators” access the virtual volumes. The “Virtual Volume” object is a collection of volumes that are constructed from the storage volumes that are provided to the EMC VPLEX from the back-end storage arrays. It can be seen in the red boxed area of Figure 5 that a virtual volume is constructed from a “Device” that in turn can be a combination of different devices built on top of an abstract entity called “Extent”. The figure also shows that an “Extent” is created from the “Storage Volume” exposed to the EMC VPLEX.

Using VMware vSphere with EMC VPLEX

11

Also shown in Figure 4 on page 10 in the bottom callout are the three high-level steps that are required to provision storage from EMC VPLEX. The wizard supports a centralized mechanism for provisioning storage to different cluster members in case of EMC VPLEX Metro or Geo. The first step in the process of provisioning storage from EMC VPLEX is the discovery of the storage arrays connected to it and the “claiming” of storage that has been exposed to EMC VPLEX. This first part of this step needs to be rarely executed since the EMC VPLEX proactively monitors for changes to the storage environment. The wizard not only claims the storage in this step but also creates the extents in that “Storage Volume” and finally the “Virtual Volume” that is created on that extent. These components are called out in Figure 5. Figure 6 shows an example of running through Step 1 of the EZ-Provisioning wizard which will create all objects from the storage volume to the virtual volumes. It can be seen from the figure that VPLEX software simplifies the process by automatically suggesting user-friendly names for the devices that have been exposed from the storage arrays and using those to generate names for both extents and devices.

Using VMware vSphere with EMC VPLEX

12

Using VMware vSphere with EMC VPLEX

13

Figure 6. Creating virtual volumes in the EZ-Provisioning VPLEX wizard For the sake of simplicity, in VMware environments, it is recommended to create a single extent on the storage volume that was created on the device presented from the storage array. The wizard does this automatically for the user. The virtual volume can be exposed to VMware vSphere, as discussed earlier, by creating a storage view by combining the objects, “Registered initiators”, “VPLEX ports”, and “Virtual Volumes”. To do this, the WWN of the initiators on the VMware ESX hosts has to be first registered on EMC VPLEX. This can be accomplished in Step 2 of the EZ-Provisioning wizard. When “Step 2: Register initiators” is selected, a “Task Help” screen appears as seen in Figure 7. This dialog box explains how to register the initiators.

Figure 7. Task Help screen

Using VMware vSphere with EMC VPLEX

14

Figure 8. Listing unregistered initiators logged in to the EMC VPLEX

When the initiators are zoned to the front-end ports of the EMC VPLEX, they automatically log in to EMC VPLEX. As seen in Figure 8 these initiators are displayed with the prefix, “UNREGISTERED-”, followed by the WWPN of the initiator. However, initiators can also be manually registered before they are zoned to the front-end ports of the VPLEX. The button highlighted in yellow in Figure 8 should be selected to perform this operation. The initiators logged in to EMC VPLEX can be registered by highlighting the unregistered initiator and clicking the Register button. This is demonstrated in Figure 9. The inset in the figure shows the window that is opened when the Register button is clicked. The inset also shows the facility provided by EMC VPLEX to assign a user-friendly name to the unregistered initiator and also select a host type for the initiator that is being registered. Once the information is added, click OK to complete registration. Note that multiple unregistered initiators may be selected at once for registration.

Using VMware vSphere with EMC VPLEX

15

Figure 9. Registering VMware HBAs on the EMC VPLEX The final step in provisioning storage from EMC VPLEX to the VMware environment is the creation of the storage view. This is achieved by selecting the final step in the EZProvisioning wizard, “Step 3: Create storage view” on the Provisioning Overview page of the VPLEX management system. Figure 10 shows the first window that is opened. The left-hand pane of the window shows the steps that have to be performed to create a storage view.

Using VMware vSphere with EMC VPLEX

16

Figure 10. Wizard for creating a VPLEX storage view Stepping through the wizard provisions the appropriate virtual volumes to VMware vSphere using the defined set of VPLEX front-end ports. This is shown in Figure 11. Note that the recommendation for the VPLEX ports that should be used when connecting VMware ESX hosts to EMC VPLEX is discussed in the section “Connectivity considerations.”

Figure 11. Selecting ports for the selected initiators

Using VMware vSphere with EMC VPLEX

17

Once the available ports have been added, virtual volumes can be assigned to the view as seen in Figure 12.

Figure 12. Adding virtual volumes to the storage view Finally, Figure 13 details the results of the final step of the EZ-Provisioning wizard. With that screen’s message of success, storage has been presented to the VMware vSphere environment and is available for use as a raw device mapping (RDM) or for the creation of a VMware datastore.

Figure 13. Final results view for the EZ-Provisioning wizard

Using VMware vSphere with EMC VPLEX

18

Figure 14 shows the storage view created using the wizard. The WWN of the virtual volume exposed through the view is highlighted in the figure. This information is used by VMware vSphere to identify the devices.

Figure 14. Viewing details of a storage view utilizing the VPLEX management interface The newly provisioned storage can be discovered on the VMware ESX hosts by performing a rescan of the SCSI bus. The result from the scan is shown in Figure 15. It can be seen that the VMware ESX host has access to a device with WWN 6000144000000010e01443ee283912b8. A quick comparison of the WWN with the information highlighted in green in Figure 14 confirms that the device discovered by the VMware ESX host is indeed the newly provisioned VPLEX virtual volume. The figure also shows the FC organizationally unique identifier (OUI) for EMC VPLEX devices as 00:01:44.

Using VMware vSphere with EMC VPLEX

19

Figure 15. Discovering newly provisioned VPLEX storage on a VMware ESX host Once the VPLEX device has been discovered by the VMware ESX hosts, they can be used for creating a VMware file system (datastore), or used as a RDM. However, for optimal performance it is important to ensure the I/Os to the EMC VPLEX are aligned to a 64 KB block boundary. The VMware file system created using the vSphere Client automatically aligns the file system blocks. However, a misaligned partition on a guest operating system can impact performance negatively. Therefore, it is critical to ensure that all partitions created on the guest operating system (either on a virtual disk presented from a VMware file system or a RDM) are aligned to a multiple of 64 KB.

EMC Virtual Storage Integrator and VPLEX EMC Virtual Storage Integrator (VSI) for VMware vSphere is a plug-in to the VMware vSphere client that provides a single management interface used for managing EMC storage within the vSphere environment. Features can be added and removed from VSI independently, providing flexibility for customizing VSI user environments. VSI provides a unified user experience, allowing each of the features to be updated independently, and new features to be introduced rapidly in response to changing customer requirements. Examples of features available for VSI are: Storage Viewer (SV), Path Management, Storage Pool Management (SPM), Symmetrix SRA Utilities, and Unified Storage Management.

Using VMware vSphere with EMC VPLEX

20

Storage Viewer feature The Storage Viewer feature extends the vSphere Client to facilitate the discovery and identification of EMC Symmetrix®, CLARiiON®, Celerra®, and VPLEX storage devices that are allocated to VMware ESX/ESXi™ hosts and virtual machines. SV presents the underlying storage details to the virtual datacenter administrator, merging the data of several different storage mapping tools into a few seamless vSphere Client views. SV enables users to resolve the underlying storage of Virtual Machine File System (VMFS) and Network File System (NFS) datastores and virtual disks, as well as raw device mappings (RDM). In addition, Storage Viewer also lists storage arrays and devices that are accessible to the ESX and ESXi hosts in the virtual datacenter. In case the underlying storage hosting a VMFS datastore is hosted on a VPLEX volume, Storage Viewer provides details of the Virtual Volumes, Storage Volumes, and Paths that make up the datastore or LUN. Figure 16 is a compilation of all three different views together, though each may be displayed only one at a time in the plug-in.

Figure 16. Storage Viewer (VSI) datastore view of a VPLEX device The LUNs view of Storage Viewer provides similar information. Note that in this view shown in Figure 17, there is a Used By column to inform the user how the LUN is being employed in the environment.

Using VMware vSphere with EMC VPLEX

21

Figure 17. Storage Viewer (VSI) LUN view of a VPLEX device The other feature of VSI that can be used with VPLEX is Path Management. It will be addressed later in this paper.

Connectivity considerations EMC VPLEX introduces a new type of storage federation paradigm that provides increased resiliency, performance, and availability. The following paragraph discusses the recommendations for connecting VMware ESX hosts to EMC VPLEX. The recommendations ensure the highest level of connectivity and availability to VMware vSphere even during abnormal operations. As a best practice, each VMware ESX host in the VMware vSphere environment should have at least two physical HBAs, and each HBA should be connected to at least two front-end ports on director A and director B on EMC VPLEX. This configuration ensures continued use of all HBAs on the VMware ESX host even if one of the front-end ports of the EMC VPLEX goes offline for either planned maintenance events or unplanned disruptions. When a single VPLEX Engine configuration is connected to a VMware vSphere environment, each HBA should be connected to the front-end ports provided on both the A and B directors within the VPLEX Engine. Connectivity to the VPLEX front-end ports should consist of first connecting unique hosts to port 0 of each I/O module emulating the front-end directors before connecting additional hosts to the remaining ports on the I/O module. A schematic example of the wiring diagram for a four-node

Using VMware vSphere with EMC VPLEX

22

VMware vSphere environment connected to a single VPLEX Engine is shown in Figure 18.

Figure 18. Connecting a VMware vSphere server to a single-engine VPLEX Cluster If multiple VPLEX Engines are available, as is the case in the dual- and quad-engine VPLEX Cluster configurations, the HBAs from the VMware ESX hosts can be connected to different engines. Using both directors on the same engine minimizes cache coherency traffic, while using directors on different engines (with dual and quad configurations) provides greater resiliency. The decision on which configuration to select is based on the desired objectives. For example, one possible connectivity diagram for a four-node VMware ESX cluster connected to a two-engine VPLEX Cluster is schematically shown in Figure 19. It is important to note that in both Figure 18 and Figure 19, the connectivity between the VPLEX Engines and the storage arrays has not been displayed. The connectivity

Using VMware vSphere with EMC VPLEX

23

from the VPLEX Engines to the storage arrays should follow the best practices recommendation for the array. A detailed discussion of the best practices for connecting the back-end storage is beyond the scope of this paper. Interested readers should consult the TechBook EMC VPLEX Metro Witness Technology and High Availability.

Figure 19. Connecting ESX hosts to a multiple-engine VPLEX Cluster When the VMware ESX host is connected to an EMC VPLEX using the best practices discussed in this section, the VMware kernel will associate four paths to each device presented from the system. Figure 20 shows the paths available and used by the VMware kernel for one of the federated devices presented from EMC VPLEX. As can be seen in the figure, the VMware kernel can access the device using one of the four possible paths. It is important to note that the EMC VPLEX is an active/active array that allows simultaneous access to any VPLEX device from any of the front-end ports. This fact is recognized by the VMware kernel automatically, and is highlighted in green in Figure 20. The screenshot is taken from the Virtual Storage Integrator plug-in for the vSphere client. This plug-in is available for free on Powerlink.

Using VMware vSphere with EMC VPLEX

24

Figure 20. VMware kernel paths for a VPLEX device in Virtual Storage Integrator (VSI) The connectivity from the VMware ESX hosts to the multiple-engine VPLEX Cluster can be scaled as more engines are added. The methodologies discussed in this section ensure all front-end ports are utilized while providing maximum potential performance and load balancing for VMware vSphere.

Multipathing and load balancing The VMware ESX host provides native channel failover capabilities. The ESX host for active/active storage systems, by default, assigns the path it discovers first to any SCSI-attached device as the preferred path with a “fixed” failover policy. This path is always used as the active path for sending I/O to that device unless the path is unavailable due to a planned or an unplanned event. The remaining paths discovered by the VMware ESX host for the device are used as a passive failover path and utilized only if the active path fails. Therefore, VMware ESX hosts automatically queue all of the I/Os on the first available HBA in the system, while the other HBA is not actively used until a failure on the primary HBA is detected. This behavior leads to an unbalanced configuration on the ESX host and on the EMC VPLEX. There are a number of ways to address this. The most appropriate method, as discussed in the following sections, depends on the multipathing software that is used.

VMware ESX version 4.x and NMP VMware ESX version 4.x includes advanced path management and load-balancing capabilities exposed through the policies “Fixed”, “Round Robin”, and “Most Recently Used”. The default policy used by the ESX kernel for active/active arrays is “Fixed”. However, for most active/active arrays such as the EMC Symmetrix arrays, round-robin is the most appropriate policy. Nonetheless, the advanced cache management features provided by the EMC VPLEX can be disrupted by the use of a

Using VMware vSphere with EMC VPLEX

25

simple load-balancing algorithm provided by the Round Robin policy. Therefore, for VMware ESX version 4.x connected to EMC VPLEX, EMC recommends the use of the Fixed policy with static load balancing by changing the preferred path. In addition, the changes to the preferred path should be performed on all of the ESX hosts accessing the VPLEX devices. The preferred path on VMware ESX version 4 can be set using vSphere Client. Figure 21 shows the procedure that can be used to set the preferred path for a physical disk in a VMware vSphere environment. Figure 22 shows the preferred path setting for two datastores, each residing on an EMC VPLEX device presented from front-end ports A0FC00, A1-FC00, B0-FC00, and B1-FC00.

Figure 21. Setting the preferred path on VMware ESX version 4

Figure 22. EMC VPLEX devices with static load balancing on ESX version 4

Using VMware vSphere with EMC VPLEX

26

VMware ESX version 4.x with PowerPath/VE EMC PowerPath®/VE delivers PowerPath multipathing features to optimize VMware vSphere virtual environments. PowerPath/VE enables standardization of path management across heterogeneous physical and virtual environments. PowerPath/VE enables one to automate optimal server, storage, and path utilization in a dynamic virtual environment. With hyper-consolidation, a virtual environment may have hundreds or even thousands of independent virtual machines running, including virtual machines with varying levels of I/O intensity. I/O-intensive applications can disrupt I/O from other applications, and before the availability of PowerPath/VE, as discussed in previous sections, load balancing on an ESX host system had to be manually configured to correct for this. Manual load-balancing operations to ensure that all virtual machines receive their individual required response times are timeconsuming and logistically difficult to effectively achieve. PowerPath/VE works with VMware ESX and ESXi as a multipathing plug-in (MPP) that provides enhanced path management capabilities to ESX and ESXi hosts. PowerPath/VE is supported with vSphere (ESX version 4) only. Previous versions of ESX do not have the PSA, which is required by PowerPath/VE. PowerPath/VE installs as a kernel module on the vSphere host. As shown in Figure 23, PowerPath/VE plugs in to the vSphere I/O stack framework to bring the advanced multipathing capabilities of PowerPath – dynamic load balancing and automatic failover – to VMware vSphere.

Figure 23. PowerPath/VE vStorage API for multipathing plug-in At the heart of PowerPath/VE path management is server-resident software inserted between the SCSI device-driver layer and the rest of the operating system. This driver

Using VMware vSphere with EMC VPLEX

27

software creates a single “pseudo device” for a given array volume (LUN) regardless of how many physical paths on which it appears. The pseudo device, or logical volume, represents all physical paths to a given device. It is then used for creating a VMware file system or for raw device mapping (RDM). These entities can be then used for application and database access. PowerPath/VE’s value fundamentally comes from its architecture and position in the I/O stack. PowerPath/VE sits above the HBA, allowing heterogeneous support of operating systems and storage arrays. By integrating with the I/O drivers, all I/Os run through PowerPath and allow for it to be a single I/O control and management point. Since PowerPath/VE resides in the ESX kernel, it sits below the guest OS level, application level, database level, and file system level. PowerPath/VE’s unique position in the I/O stack makes it an infrastructure manageability and control point, thus bringing more value going up the stack. PowerPath/VE features PowerPath/VE provides the following features: •

Dynamic load balancing – PowerPath is designed to use all paths at all times. PowerPath distributes I/O requests to a logical device across all available paths, rather than requiring a single path to bear the entire I/O burden.



Auto-restore of paths – Periodic auto-restore reassigns logical devices when restoring paths from a failed state. Once restored, the paths automatically rebalance the I/O across all active channels.



Device prioritization – Setting a high priority for a single or several devices improves their I/O performance at the expense of the remaining devices, while otherwise maintaining the best possible load balancing across all paths. This is especially useful when there are multiple virtual machines on a host with varying application performance and availability requirements.



Automated performance optimization – PowerPath/VE automatically identifies the type of storage array and sets the highest performing optimization mode by default. For VPLEX, the default mode is Adaptive.



Dynamic path failover and path recovery – If a path fails, PowerPath/VE redistributes I/O traffic from that path to functioning paths. PowerPath/VE stops sending I/O to the failed path and checks for an active alternate path. If an active path is available, PowerPath/VE redirects I/O along that path. PowerPath/VE can compensate for multiple faults in the I/O channel (for example, HBAs, fiber-optic cables, Fibre Channel switch, storage array port).



Monitor/report I/O statistics – While PowerPath/VE load balances I/O, it maintains statistics for all I/O for all paths. The administrator can view these statistics using rpowermt.



Automatic path testing – PowerPath/VE periodically tests both live and dead paths. By testing live paths that may be idle, a failed path may be identified before an application attempts to pass I/O down it. By marking the path as failed

Using VMware vSphere with EMC VPLEX

28

before the application becomes aware of it, timeout and retry delays are reduced. By testing paths identified as failed, PowerPath/VE will automatically restore them to service when they pass the test. The I/O load will be automatically balanced across all active available paths. PowerPath/VE management PowerPath/VE uses a command set, called rpowermt, to monitor, manage, and configure PowerPath/VE for vSphere. The syntax, arguments, and options are very similar to the traditional powermt commands used on all other PowerPath multipathing-supported operating system platforms. There is one significant difference in that rpowermt is a remote management tool. Not all vSphere installations have a service console interface. In order to manage an ESXi host, customers have the option to use VMware vCenter™ Server or vCLI (also referred to as VMware Remote Tools) on a remote server. PowerPath/VE for vSphere uses the rpowermt command line utility for both ESX and ESXi. PowerPath/VE for vSphere cannot be managed on the ESX host itself. There is neither a local nor remote GUI for PowerPath on ESX. Administrators must designate a Guest OS or a physical machine to manage one or multiple ESX hosts. The utility, rpowermt, is supported on Windows 2003 (32-bit) and Red Hat 5 Update 2 (64-bit). When the vSphere host server is connected to the EMC VPLEX, the PowerPath/VE kernel module running on the vSphere host associates all paths to each device presented from the array and assigns a pseudo device name (as discussed earlier). An example of this is shown in Figure 24, which shows the output of rpowermt display host=x.x.x.x dev=emcpower11. Note in the output that the device has four paths and displays the default optimization mode for VPLEX devices – ADaptive. The default optimization mode is the most appropriate policy for most workloads and should not be changed.

Figure 24. Output of the rpowermt display command on a VPLEX device

Path Management feature of Virtual Storage Integrator A far easier way to set the preferred path for all types of EMC devices is to use the Path Management feature of Virtual Storage Integrator. Using this feature, one can set the preferred path on all EMC devices at the ESX host level or at the cluster level. The feature also permits setting a different policy for each type of EMC device:

Using VMware vSphere with EMC VPLEX

29

Symmetrix, CLARiiON, and VPLEX. This is extremely useful as a customer may have many different types of devices presented to their vSphere environment. The Path Management feature can set the policy for both NMP and PowerPath/VE. Figure 25 shows the navigation to change the multipathing policy with VSI.

Figure 25. Navigating to the Path Management feature of VSI The many options within Path Management are display in Figure 26.

Using VMware vSphere with EMC VPLEX

30

Figure 26. VSI Path Management options Through the feature, multiple changes to policies can be made at once. For the particular example shown in Figure 27 all hosts have PowerPath/VE installed. All Symmetrix devices under PowerPath control will be set to Symmetrix Optimization, while all VPLEX devices will be set to Adaptive.

Using VMware vSphere with EMC VPLEX

31

Using VMware vSphere with EMC VPLEX

32

Figure 27. Making multiple multipathing changes

Migrating existing VMware environments to VPLEX Existing deployments of VMware vSphere can be migrated to VPLEX environments. There are a number of different alternatives that can be leveraged. The easiest method to migrate to a VPLEX environment is to use storage vMotion. However, this technique is viable only if the storage array has sufficient free storage to accommodate the largest datastore in the VMware environment. Furthermore, storage vMotion may be tedious if several hundreds of virtual machines or terabytes have to be converted, or if the virtual machines have existing snapshots. For these scenarios, it might be appropriate to leverage the capability of EMC VPLEX to encapsulate existing devices. However, this methodology is disruptive and requires planned outages to VMware vSphere.

Nondisruptive migrations using storage vMotion Figure 29 shows the datastores available on VMware ESX version 4.1 managed by a vSphere vCenter Server. The view is available using the Storage Viewer feature of EMC

Using VMware vSphere with EMC VPLEX

33

Virtual Storage Integrator. The datastores are backed by both VPLEX and non-VPLEX devices.

Figure 28. Datastore view in Storage Viewer (VSI) It can be seen from Figure 29 that the virtual machine “W2K8 VM1” resides on datastore Management_Datastore_1698 hosted on device CA9 on a Symmetrix VMAX™ array.

Figure 29. Details of the EMC storage device displayed by EMC Storage Viewer The migration of the data from the Symmetrix VMAX arrays to the storage presented from VPLEX can be performed using storage vMotion once appropriate datastores are created on the devices presented from VPLEX. In this example the VM “W2K8 VM1” will be migrated from its current datastore on the Symmetrix, to the datastore vplex_boston_local, which resides on the VPLEX and shown earlier in Figure 17.

Using VMware vSphere with EMC VPLEX

34

Figure 30 shows the steps required to initiate the migration of a virtual machine from Management_Datastore_1698 to the target datastore, vplex_boston_local. The storage vMotion functionality is also available via a command line utility. Detailed discussion of storage vMotion is beyond the scope of this white paper. Further details on storage vMotion can be found in the VMware documentation listed in the “References” section.

Figure 30. Using storage vMotion to migrate virtual machines to VPLEX devices

Migration using encapsulation of existing devices As discussed earlier, although storage vMotion provides the capability to perform nondisruptive migration from an existing VMware deployment to EMC VPLEX, it might not be always a viable tool. For these situations, the encapsulation capabilities of EMC VPLEX can be leveraged. The procedure, however, is disruptive but the duration of the disruption can be minimized by proper planning and execution.

Using VMware vSphere with EMC VPLEX

35

The following steps need to be taken to encapsulate and migrate an existing VMware deployment. 1. Zone the back-end ports of EMC VPLEX to the front-end ports of the storage array currently providing the storage resources. 2. The next step should be to change the LUN masking on the storage array so the EMC VPLEX has access to the devices that host the VMware datastores. In the example below, the devices 4EC (for Datastore_1) and 4F0 (for Datastore_2) have to be masked to EMC VPLEX. Figure 31 shows the devices that are visible to EMC VPLEX after the masking changes have been performed and a rescan of the storage array has been performed on EMC VPLEX. The figure also shows the SYMCLI output of the Symmetrix VMAX devices and their corresponding WWNs. A quick comparison clearly shows that EMC VPLEX has access to the devices that host the datastores that need to be encapsulated.

Figure 31. Discovering devices to be encapsulated on EMC VPLEX 3. Once the devices are visible to EMC VPLEX they have to be claimed. This step is shown in Figure 32. The “-appc” flag during the claiming process ensures that the content of the device that is being claimed is preserved, and that the device is encapsulated for further use within the EMC VPLEX.

Using VMware vSphere with EMC VPLEX

36

Figure 32. Encapsulating devices in EMC VPLEX while preserving existing data 4. After claiming the devices a single extent that spans the whole disk has to be created. Figure 33 shows this step for the two datastores that are being encapsulated in this example.

Figure 33. Creating extents on encapsulated storage volumes claimed by VPLEX

Using VMware vSphere with EMC VPLEX

37

5. A VPLEX device (local device) with a single RAID 1 member should be created using the extent that was created in the previous step. This is shown for the two datastores, Datastore_1 and Datastore_2, hosted on device 4EC and 4F0, respectively, in Figure 34. The step should be repeated for all of the storage array devices that need to be encapsulated and exposed to the VMware environment.

Figure 34. Creating a VPLEX RAID 1 protected device on encapsulated VMAX devices 6. A virtual volume should be created on each VPLEX device that was created in the previous step. This is shown in Figure 35 for the VMware datastores Datastore_1 and Datastore_2.

Figure 35. Creating virtual volumes on VPLEX to expose to VMware vSphere 7. It is possible to create a storage view on EMC VPLEX by manually registering the WWN of the HBAs on the VMware ESX hosts that are part of the VMware vSphere domain. The storage view should be created first to allow VMware vSphere access to the virtual volume(s) that was/were created in step 6. By doing so, the disruption to the service during the switchover from the original storage array over to EMC VPLEX can be minimized. An example of this step for the environment used in this study is shown in Figure 36.

Using VMware vSphere with EMC VPLEX

38

Figure 36. Creating a storage view to present encapsulated devices to VMware ESX hosts 8. In parallel to the operations conducted on EMC VPLEX, new zones should be created that allow the VMware ESX hosts involved in the migration access to the front-end ports of EMC VPLEX. These zones should also be added to the appropriate zone set. Furthermore, the zones that provide the VMware ESX host access to the storage array whose devices are being encapsulated should be removed from the zone set. However, the modified zone set should not be activated until the maintenance window when the VMware virtual machines can be shut down. It is important to ensure that the encapsulated devices are presented to the ESX hosts only through the VPLEX front-end ports. The migration of the VMware environment to VPLEX can fail if devices are presented from both VPLEX and the storage subsystem to the VMware ESX hosts simultaneously. Furthermore, there is a potential for data corruption if the encapsulated devices are presented simultaneously from the storage array and the VPLEX system. 9. When the maintenance window is open, all of the virtual machines that would be impacted by the migration should be first shut down gracefully. This can be either done with the vSphere Client, or command line utilities that leverage the VMware SDK. 10. Activate the zone set that was created in step 8. A manual rescan of the SCSI bus on the VMware ESX hosts should remove the original devices and add the encapsulated devices presented from the VPLEX system. 11. The devices presented from the VPLEX system host the original datastore. However, the VMware ESX hosts do not automatically mount datastores since VMware ESX considers datastores as a snapshot since the WWN of the devices

Using VMware vSphere with EMC VPLEX

39

exposed through the VPLEX system differs from the WWN of the devices presented from the Symmetrix VMAX system. 12. Figure 37 shows an example of this for a VMware vSphere environment. The figure shows all of the original virtual machines in the environment are now marked as inaccessible. This occurs since the datastores, Datastore_1 and Datastore_2, created on the devices presented from the VMAX system are no longer available.

Figure 37. Rescanning the SCSI bus on the VMware ESX hosts VMware vSphere allows access to datastores that are considered snapshots in two different ways– the snapshot can be either resignatured or can be persistently mounted. In VMware vSphere environments, the resignaturing process of datastores that are considered snapshots can be performed on a device-by-device basis. This reduces the risk of mistakenly resignaturing the encapsulated devices from the VPLEX system. Therefore, for a homogeneous vSphere environment (that is, all ESX hosts in the environment are at version 4.0 or later), EMC recommends the use of persistent mounts for VMware datastores that are encapsulated by VPLEX. The use of persistent mount also provides other advantages such as retaining of the history of all of the virtual machines. The datastore on devices encapsulated by VPLEX can also be accessed by resignaturing it. However, using this method adds unnecessary complexity to the recovery process and is not recommended. Therefore, the procedure to recover a VMware vSphere environment utilizing the method is not discussed in this document.

Using VMware vSphere with EMC VPLEX

40

A detailed discussion of the process to persistently mount datastores is beyond the scope of this white paper. Readers should consult the VMware document Fibre Channel SAN Configuration Guide available on www.vmware.com. The results after the persistent mounting of the datastores presented from EMC VPLEX are shown in Figure 38. It can be seen that all of the virtual machines that were inaccessible are now available. The persistent mount of the datastores considered snapshots retains both the UUID of the datastore and the label. Since the virtual machines are crossreferenced using the UUID of the datastores, the persistent mount enables vCenter Server to rediscover the virtual machines that were previously considered inaccessible.

Figure 38. Persistently mounting datastores on encapsulated VPLEX devices

VMware deployments in a VPLEX Metro environment EMC VPLEX breaks physical barriers of data centers and allows users to access data at different geographical locations concurrently. This functionality in a VMware context enables functionality that was not available previously. Specifically, the ability to concurrently access the same set of devices independent of the physical location enables geographically stretched clusters based on VMware vSphere 1. This allows for transparent load sharing between multiple sites while providing the flexibility of migrating workloads between sites in anticipation of planned events such as hardware maintenance. Furthermore, in case of an unplanned event that causes disruption of services at one of the data centers, the failed services can be quickly and easily restarted at the surviving site with minimal effort. Nevertheless, the 1

The solution requires extension of VLAN to different physical data centers. Technologies such as Cisco’s Overlay Transport Virtualization (OTV) can be leveraged to provide the service.

Using VMware vSphere with EMC VPLEX

41

design of the VMware environment has to account for a number of potential failure scenarios and mitigate the risk for services disruption. The following paragraphs discuss the best practices for designing the VMware environment to ensure an optimal solution. For further information on EMC VPLEX Metro configuration readers should consult the TechBook EMC VPLEX Metro Witness Technology and High Availability available on Powerlink.

VPLEX witness VPLEX uses rule sets to define how a site or link failure should be handled in a VPLEX Metro or VPLEX Geo configuration. If two clusters lose contact, the rule set defines which cluster continues operation and which suspends I/O. The rule set is applied on a device-by-device basis or for a consistency group. The use of rule sets to control which site is a winner, however, adds unnecessary complexity in case of a site failure since it may be necessary to manually intervene to resume I/O to the surviving site. VPLEX with GeoSynchrony 5.0 introduces a new concept to handle such an event, the VPLEX Witness. VPLEX Witness is a virtual machine that runs in an independent (3rd) fault domain. It provides the following features: •

Active/active use of both data centers



High availability for applications (no single points of storage failure, auto-restart)



Fully automatic failure handling



Better resource utilization



Lower capital expenditures and lower operational expenditures as a result

Typically data centers implement highly available designs within a data center, and deploy disaster recovery functionality between data centers. This is because within the data center, components operate in an active/active (or active/passive with automatic failover). However, between data centers, legacy replication technologies use active/passive techniques and require manual failover to use the passive component. When using VPLEX Metro active/active replication technology in conjunction with VPLEX Witness, the lines between local high availability and longdistance disaster recovery are somewhat blurred because high availability is stretched beyond the data center walls. A configuration that uses any combination of VPLEX Metro and VPLEX Witness is considered a VPLEX Metro HA configuration. The key to this environment is AccessAnywhere. It allows both clusters to provide coherent read/write access to the same virtual volume. That means that on the remote site, the paths are up and the storage is available even before any failover happens. When this is combined with host failover clustering technologies such as VMware HA, one gets a fully automatic application restart for any site-level disaster. The system rides through component failures within a site, including the failure of an entire array. VMware ESX can be deployed at both VPLEX clusters in a Metro environment to create a high availability environment. Figure 39 shows the Metro HA configuration that will be used in this paper.

Using VMware vSphere with EMC VPLEX

42

Figure 39. VMware Metro HA with VPLEX Witness In this scenario, a virtual machine can write to the same distributed device from either cluster. In other words, if the customer is using VMware Distributed Resource Scheduler (DRS), which allows the automatic load distribution on virtual machines across multiple ESX servers, a virtual machine can be moved from an ESX server attached to Cluster-1 to an ESX server attached to Cluster-2 without losing access to the underlying storage. This configuration allows virtual machines to move between two geographically disparate locations with up to 5 ms of latency, the limit to which VMware VMotion is supported. In the event of a complete site failure, VPLEX Witness automatically signals the surviving cluster to resume I/O rather than following the rule set. VMware HA detects the failure of the virtual machines and restarts the virtual machines automatically at the surviving site with no external intervention. It is important to note that, a data unavailability event can occur when there is not a full site outage but there is a VPLEX outage on Cluster-1 and the virtual machine is currently running on the ESX server

Using VMware vSphere with EMC VPLEX

43

attached to Cluster-1. If this configuration also contains a VPLEX Witness, the witness recognizes the outage and recommends Cluster-2 resume I/O rather than following the rule set. However, VMware vSphere does not recognize these types of failures and does not automatically move the failed virtual machines to the surviving site. To protect against this, one can leverage the cross-connectivity feature of the VPLEX storage system. Alternatively, users can intervene and manually move the virtual machine to the Cluster-2 ESX server to provide access to the data. These options are discussed in the section “Cross-connecting VMware vSphere environments to VPLEX clusters for increased resilience.” The following section will detail the example of a VPLEX Metro configuration with VMware HA and VPLEX Witness and its behavior in case of a complete site failure.

VMware cluster configuration with VPLEX Witness VMware cluster configuration Figure 40 shows the recommended cluster configuration for VMware deployments that leverage devices presented through EMC VPLEX Metro with VPLEX Witness employed. It can be seen from the figure that VMware vSphere consists of a single VMware cluster. The cluster includes four VMware ESX hosts with two at each physical data center (Site A and Site B). Also shown in the figure, as an inset, are the settings for each cluster. The inset shows that VMware DRS and VMware HA are active in the cluster and that VM Monitoring is activated.

Using VMware vSphere with EMC VPLEX

44

Figure 40. Configuration of VMware clusters with EMC VPLEX Metro utilizing VPLEX Witness To take full advantage of the HA cluster failover capability in a VPLEX Metro cluster that employs VPLEX Witness, it is necessary to create DRS Groups and then Rules that will govern how the VMs will be restarted in the event of a site failure. Setting these up is a fairly simple procedure in the vSphere Client. This will allow restarting of VMs in the event of a failure at either Site A or Site B. VMware DRS Groups and Rules To access the wizards to create DRS Groups and Rules, right-click on the cluster (Boston) and navigate to VMware DRS. Leave the automation level at “Fully automated” to permit VMware to move the virtual machines as necessary as in Figure 41.

Using VMware vSphere with EMC VPLEX

45

Figure 41. VMware DRS For DRS groups, one should create two Virtual Machines DRS Groups and two Host DRS Groups. In Figure 42 four groups have been created. VPLEX_Cluster1_VM_Group contains those VMs associated with cluster-1 and VPLEX_Cluster1_Host_Group contains those hosts associated with cluster-1. The cluster-2 setup is similar.

Using VMware vSphere with EMC VPLEX

46

Figure 42. DRS Groups for the cluster Boston

Using VMware vSphere with EMC VPLEX

47

Now that the DRS groups are in place, rules need to be created to govern how the DRS groups should behave when there is a site failure. There are two rules, one that applies to cluster-1 and one that applies to cluster-2. The rule Boston_VMs_Affinity_to_Cluster-1, seen in Figure 43, dictates that the VMs associated with cluster-1 (through the DRS group) “should run” on the hosts associated with cluster-1 (again through the DRS group) – vice-versa for cluster-2’s VMs. It is important that the condition for both rules is “should run” and not “must run” since this gives flexibility for the VMs to start-up on the two hosts that survive in a site failure. Each rule will permit the VMs associated with the failing cluster to be brought up on the two hosts that are part of the site that did not fail, and most importantly, to automatically migrate back to their original hosts when the site failure is resolved.

Figure 43. DRS Rules for cluster-1 and cluster-2 in VPLEX Metro utilizing VPLEX Witness Site failure To demonstrate the effectiveness of VPLEX Witness in a VPLEX Metro with VMware a site failure test was conducted based upon the previous configuration. At the start of the test, the VMs were running on their preferred hosts as seen in Figure 44.

Using VMware vSphere with EMC VPLEX

48

Figure 44. Site failure test with VPLEX Witness – VM configuration The test involved mimicking a complete site failure by shutting down both the VPLEX at the site along with the two ESX hosts. Before testing, connectivity to the VPLEX Witness from both sites was confirmed, shown in Figure 45.

Using VMware vSphere with EMC VPLEX

49

Figure 45. Confirming VPLEX Witness connectivity from VPLEX sites In this example there are two consistency groups, one for each site, which each contain a distributed device. The two consistency groups are seen in Figure 46 while the two distributed devices are shown in Figure 47. Note that with VPLEX Witness in use, the detach rules of the individual distributed devices are going to be superseded. In a site failure scenario, the remaining cluster will take possession of all distributed devices, continuing I/O to each one despite the detach rule.

Using VMware vSphere with EMC VPLEX

50

Figure 46. Consistency groups in site failure example

Figure 47. Distributed devices On each of these distributed devices resides two virtual machines each as shown in Figure 47. The VMs are spread across the two distributed devices as follows in Figure 48.

Using VMware vSphere with EMC VPLEX

51

Figure 48. VM datastore location Upon site failure, the VMs associated with the failed site will crash and the surviving site will take ownership of all the distributed devices. VMware HA then uses the DRS rules in place and restarts the failed VMs on the surviving nodes, maintaining their datastore location. The actions take place in a series of steps: 1. Site failure occurs – both hosts and VPLEX cluster are down (Figure 49 and Figure 50).

Figure 49. VPLEX site failure

Using VMware vSphere with EMC VPLEX

52

Figure 50. Hosts and VMs inaccessible after site failure 2. VMware HA recognizes that there is an issue with the hosts and VMs and initiates a failover action based upon the DRS rules configured previously. This is shown in Figure 51.

Using VMware vSphere with EMC VPLEX

53

Figure 51. VMware HA initiating action 3. VMware restarts the failed virtual machines on the remaining available hosts as shown in Figure 52.

Using VMware vSphere with EMC VPLEX

54

Figure 52. Restarting of virtual machines on the remaining hosts after site failure Recovery 4. When the VPLEX site comes back online after site failure, the distributed devices in the consistency groups will need to be resumed at the failed site. This is accomplished using the VPLEXcli command resume-at-loser shown in Figure 53. Unless explicitly configured otherwise (using the “auto-resume-at-loser” property), I/O remains suspended on the losing cluster. This prevents

Using VMware vSphere with EMC VPLEX

55

applications at the losing cluster from experiencing a spontaneous data change. The delay allows the administrator to shut down applications if necessary.

Figure 53. Resuming I/O on the failed site Depending on the activity that took place on the surviving site during the outage, the volumes may need rebuilding on the failed site. This could take some time to complete depending on the amount of changes, but the system remains available during the rebuild. From the GUI, one would see a degraded state for the device as it rebuilds, but from the CLI (Figure 54) the health will indicate the volume is rebuilding.

Figure 54. Rebuilding a virtual volume after resuming from site failure

5. As the ESX hosts come online, they will re-enter the VMware HA cluster. Once again the DRS rules will come into play and the virtual machines will be migrated off of the current hosts back to their original hosts. This is seen in Figure 55. This is completely transparent to the user.

Using VMware vSphere with EMC VPLEX

56

Figure 55. Virtual machines restored to original hosts The environment is now returned to full functionality.

Cross-connecting VMware vSphere environments to VPLEX clusters for increased resilience The previous section described the advantage of introducing a VPLEX Witness server into a VMware vSphere environment. The VPLEX Witness appliance enables customers with two sites separated by metropolitan distance to create a single large VMware vSphere cluster that leverages VMware HA to provide transparent failover to the surviving cluster in case of a site failure. However, due to limitations in the VMware vSphere ESX kernel, the configuration discussed in the previous section does not provide resilience against a localized site disruption that results in the failure of the VPLEX system 2 at that site but leaves the VMware vSphere environment operating properly. Figure 56 shows a normally operating VMware vSphere environment presented with storage from a VPLEX storage array. It can be seen in the figure that the ESX host, svpg-dell-c1-s09, has a number of datastores presented to it, three of which 2

The VPLEX system provides redundant hardware to protect against failure of individual components. A catastrophic failure of a VPLEX system is thus a highly unlikely event.

Using VMware vSphere with EMC VPLEX

57

(highlighted in green) are from a VPLEX system. The inset shown in the figure is for one of the virtual machines (Distributed_RH_1) hosted on the datastore, Distributed_DSC_Site_A, created on a distributed RAID-1 VPLEX volume. The inset shows the status of the VMware tools running within the guest operating system as normal.

Figure 56. Display of a normally operating VMware vSphere environment with VPLEX volumes The state of the VMware vSphere cluster in response to a complete failure of the VPLEX system is shown in Figure 57. It can be observed from the figure that the host svpg-dell-c1-s09 located at the failed site is still communicating with the vSphere vCenter Server although it has lost all access to the VPLEX virtual volumes. The datastores hosted on the failed VPLEX volumes are either missing or reported as inactive. In addition, as it can be seen from the inset in Figure 57, the status of the VMware tools running in the virtual machine exhibited in Figure 56 has now changed to “Not Running.” The state exhibited in Figure 57 occurs due to the response of VMware ESX kernel to a situation where all paths to a device (or devices) are not available (frequently referred

Using VMware vSphere with EMC VPLEX

58

to as APD condition). In this situation, although the applications running on the virtual machines hosted on the missing VPLEX volumes have failed, VMware HA cannot perform recovery of the virtual machines and the applications that they host. The only way to recover from such a situation is through manual intervention—users have to manually restart their ESX hosts for VMware HA to detect the failure of the VMs that it is monitoring and perform an appropriate corrective action.

Figure 57. State of the vSphere environment after complete failure of VPLEX system The incorrect behavior of VMware ESX hosts and the resulting unavailability of application in response to the failure of a VPLEX system can be avoided by leveraging another feature of GeoSynchrony version 5.0. Starting with this version, it is possible to cross-connect hosts to VPLEX directors located at the remote site. When this feature is leveraged in conjunction with VPLEX Witness, it is possible for ESX hosts to ride through a localized VPLEX system failure by redirecting its I/O to the VPLEX directors at the surviving site. However, it should be noted that to leverage the cross-

Using VMware vSphere with EMC VPLEX

59

connect feature, the round-trip latency between the two sites involved in a VPLEX metropolitan configuration cannot be greater than 1 ms 3. Figure 58 shows the paths that are available to the ESX host, svpg-dell-c1-s09, to access the VPLEX distributed RAID volume hosting the datastore Distributed_DSC_Site_A. The screenshot generated using EMC Virtual Storage Integrator shows that the ESX host has eight distinct paths to the device, four of which are presented from a pair of directors from one VPLEX system and the remainder from a pair of directors from a different VPLEX system. However, the information shown in the figure is not in of itself sufficient to determine if the VPLEX systems are collocated at a single physical location or if they are separated. In addition, it can be seen in Figure 58 that VMware native multipathing (NMP) 4 software cannot distinguish the locality of the VPLEX front-end ports and marks all paths as active. Therefore, it is important for users to be cognizant of the possibility that failure of a path to one of the front-end ports of the local VPLEX system can result in the ESX host accessing the data through directors at the secondary site and thus resulting in slightly degraded performance.

Figure 58. Displaying available paths and the multipathing policy using EMC VSI Figure 59 shows the WWNs of the VPLEX front-end ports of directors at both sites used in the example presented in this section. A comparison of WWNs shown in this figure and those presented in Figure 58 indicates that the ESX host, svpg-dell-c1-s09, can access the device hosting the datastore, Distributed_DSC_Site_A, through the front-end ports of directors at both sites. 3 4

Customers requiring support for larger latencies should submit a RPQ to EMC for further consideration. EMC PowerPath/VE exhibits the same behavior.

Using VMware vSphere with EMC VPLEX

60

Figure 59. World Wide Names of the front-end ports of VPLEX directors

Using VMware vSphere with EMC VPLEX

61

Furthermore, it can also be seen from Figure 59 that the best practices recommended in the earlier section are followed while connecting the ESX hosts to the VPLEX directors at the second site. Although not shown, every host at a particular site is cross-connected to the appropriate directors at the alternate site. This type of configuration ensures that all ESX hosts have alternate paths to the VPLEX directors at a peer site that allows it access to the VPLEX virtual volumes in case of a complete failure of the local VPLEX system. Figure 60 shows the state of the ESX host svpg-dell-c1-s09 cross-connected to VPLEX directors at both sites after simulation of a catastrophic failure of VPLEX system at the local site. The figure also shows the status of the virtual machine that was exhibited earlier during the presentation of the behavior of the VMware vSphere environment without cross-connectivity. The figure clearly shows that the ESX host continues to work properly even though it has lost all access to the local VPLEX system. The virtual machine also continues to operate properly, accessing the data through the VPLEX directors at the surviving site. It is interesting to note that the datastore and all virtual machines hosted on a VPLEX volume that is not replicated to the second site become inaccessible due to the unavailability of the VPLEX array. This behavior is expected since there are no additional copies of the data that can be used to restore normal operations for the virtual machines that are hosted on those volumes. However, since the decision to replicate a VPLEX volume or not is based on the SLA needs of a business, the loss of the applications associated with the failed datastore should be tolerable. The datastore, and consequently the virtual machines residing on that datastore, is automatically restored when the VPLEX system recovers 5. It is important to note that some of the virtual machines residing on the ESX hosts in a state that is shown in Figure 60 will exhibit degraded performance since all I/Os generated by them suffers an additional latency due to the distance between the ESX host and the location of the surviving VPLEX system. Therefore, if the failed VPLEX system is not expected to be restored for a significant amount of time, it is advisable to use VMware VMotion to migrate the impacted virtual machines to the surviving site. Figure 61 shows the paths that are available from the ESX host to the VPLEX volume hosting the datastore, Distributed_DSC_Site_A, discussed in the previous paragraphs. It can be seen from the figure that four of the eight paths are dead. A comparison with Figure 58 shows that the active paths belong to the directors at the surviving site.

5

Due to the reaction of VMware vSphere environments to APD scenarios, virtual machines on failed datastores may require a reboot after the datastore is fully functional.

Using VMware vSphere with EMC VPLEX

62

Figure 60. State of the vSphere environment with cross-connect after complete failure of the VPLEX system

Using VMware vSphere with EMC VPLEX

63

Figure 61. State of the paths to a datastore after the failure of a local VPLEX system Figure 62 shows the VMware vSphere environment after the local VPLEX system has recovered. Also shown in the figure as an inset are the paths to the datastore, Distributed_DSC_Site_A. It can be seen from the figure and the inset that the restoration of the failed VPLEX system resulted in the return of the VMware vSphere environment to a normal state. It should be obvious from the discussion in this section and the previous section that a VMware vSphere environment cross-connected to a VPLEX Metro HA system provides the highest level of resilience and the capability to eliminate application unavailability for a vast majority of failure scenarios. The solution can also automatically recover failed virtual machines and the applications it hosts in situations where disruption to the services cannot be avoided.

Using VMware vSphere with EMC VPLEX

64

Figure 62. VMware vSphere environment after the recovery of a failed VPLEX system

VMware cluster configuration without VPLEX Witness A VMware HA cluster uses a heartbeat to determine if the peer nodes in the cluster are reachable and responsive. In case of communication failure, the VMware HA software running on the VMware ESX host normally utilizes the default gateway for the VMware kernel to determine if it should isolate itself. This mechanism is necessary since it is programmatically impossible to determine if a break in communication is due to server failure or a network failure. The same fundamental issue presented above—whether the lack of connectivity between the nodes of the VPLEX Clusters is due to a network communication failure or site failure— applies to the VPLEX Clusters that are separated by geographical distances. A network failure is handled on the EMC VPLEX by automatically suspending all I/Os to a device (“detached”) on one of the two sites based on a set of predefined rules. The I/O operations at the other site to the same device continue normally. Furthermore, since the rules can be applied on a device-by-device basis it is possible to have active devices on both sites in case of a network partition. Imposition of the rules to minimize the impact of network interruptions does have an impact in case of a site failure. In this case, based on the rules defining the site that detaches in case of a breakdown in communications, the VPLEX Cluster at the surviving site automatically suspends the I/O to some of the devices at the surviving

Using VMware vSphere with EMC VPLEX

65

site. To address this, the VPLEX management interface provides the capability to manually resume I/Os to the detached devices. However, a more detailed discussion of the procedure to perform these operations is beyond the scope of this white paper. The TechBook EMC VPLEX Metro Witness Technology and High Availability TechBook should be consulted for further information on VPLEX Metro. Figure 63 shows the recommended cluster configuration for VMware deployments that leverage devices presented through EMC VPLEX Metro that does not include the VPLEX witness feature. It can be seen from the figure that VMware vSphere is divided into two separate VMware clusters. Each cluster includes the VMware ESX hosts at each physical data center (Site A and Site B). However, both VMware clusters are managed under a single Datacenter entity, which represents the logical combination of multiple physical sites involved in the solution. Also shown in the figure, as an inset, are the settings for each cluster. The inset shows that VMware DRS and VMware HA are active in each cluster, thus restricting the domain of operation of these components of the VMware offering to a single physical location.

Figure 63. Configuration of VMware clusters utilizing devices from EMC VPLEX Metro

Using VMware vSphere with EMC VPLEX

66

Although Figure 63 shows only two VMware clusters, it is acceptable to divide the VMware ESX hosts at each physical location into multiple VMware clusters. The goal of the recommended configuration is to prevent intermingling of the ESX hosts at multiple locations into a single VMware cluster object. Although the best practices recommendation is to segregate the ESX hosts at each site in a separate cluster, VMware and EMC support a stretched cluster configuration that includes ESX hosts from multiple sites. The VMware knowledge base article 1026692 available at http://kb.vmware.com/kb/1026692 should be consulted for further details if such a configuration is desired. The VMware datastores presented to the logical representation of the conjoined physical data centers (Site A and Site B) are shown in Figure 64. The figure shows that a number of VMware datastores is presented across both data centers 6. Therefore, the logical separation of the VMware DRS and VMware HA domain does not in any way impact, as discussed in the following section, the capability of VMware vCenter Server to transparently migrate the virtual machines operating in the cluster designated for each site to its peer site. The figure also highlights the fact that a VPLEX Metro configuration in and of itself does not imply the requirement of replicating all of the virtual volumes created on EMC VPLEX Metro to all physical data center locations 7. Virtual machines hosted on datastores encapsulated on virtual volumes with a single copy of the data and presented to the VMware cluster at that location are, however, bound to that site and cannot be nondisruptively migrated to the second site while providing protection against unplanned events. The need to host a set of virtual machines on non-replicated virtual volumes could be driven by a number of reasons including business criticality of the virtual machines hosted on those datastores.

6

The creation of a shared datastore that is visible to VMware ESX hosts at both sites is enabled by creating a distributed device in EMC VPLEX Metro. Detailed discussion of the procedures to create distributed devices is beyond the scope of this paper. Readers should consult the TechBook EMC VPLEX Architecture and Deployment — Enabling the Journey to the Private Cloud for further information. 7 It is possible to present a virtual volume that is not replicated to VMware clusters at both sites. In such a configuration, when the I/O activity generated at the site that does not have a copy of the data is not in the cache of the VPLEX Cluster at that site, it is satisfied by the storage array hosting the virtual volume. Such a configuration can impose severe performance penalties and does not protect the customer in case of unplanned events at the site hosting the storage array replication or for a onetime migration of virtual machines between data centers.

Using VMware vSphere with EMC VPLEX

67

Figure 64. Storage view of the datastores presented to VMware clusters Figure 65 is an extension of the information shown in Figure 64. This figure includes information on the virtual machines and the datastores in the configuration used in this study. The figure shows that a datastore hosts virtual machines that are executing at a single physical location. Also shown in this figure is the WWN of the SCSI device hosting the datastore “Distributed_DSC_Site_A”. The configuration of the VPLEX Metro virtual volume with the WWN displayed in Figure 65 is exhibited in Figure 66. The figure shows that the virtual volume is exported to the hosts in the VMware cluster at Site A.

Using VMware vSphere with EMC VPLEX

68

Figure 65. View of the datastores and virtual machines used in this study

Using VMware vSphere with EMC VPLEX

69

Figure 66. Detailed information of a distributed virtual volume presented to a VMware environment Figure 67 shows the rules enforced on the virtual volume hosting the datastore Distributed_DSC_Site_A. It can be seen from the figure that the rules are set to suspend I/Os at Site B in case of a network partition. Therefore, the rules ensure that if there is a network partition, the virtual machines hosted on datastore Distributed_DSC_Site_A are not impacted by it. Similarly, for the virtual machines hosted at Site B, the rules are set to ensure that the I/Os to those datastores are not impacted in case of a network partition.

Using VMware vSphere with EMC VPLEX

70

Figure 67. Viewing the detach rules on VPLEX distributed devices

Nondisruptive migration of virtual machines using VMotion in environments without VPLEX Witness An example of the capability of migrating running virtual machines between the cluster, and hence physical data centers, is shown in Figure 68. The figure clearly shows that from the VMware vCenter Server perspective the physical location of the data centers does not play a role in providing the capability to move live workloads between sites supported by EMC VPLEX Metro.

Using VMware vSphere with EMC VPLEX

71

Figure 68. vCenter Server allows live migration of virtual machines between sites Figure 69 shows a snapshot during the non-disruptive migration of a virtual machine from one site to another. The figure also shows the console of the virtual machine during the migration process, highlighting the lack of any impact to the virtual machine during the process.

Using VMware vSphere with EMC VPLEX

72

Figure 69. The progression of VMotion between two physical sites It is important to note that in cases where there are multiple virtual machines on a single distributed RAID-1 VPLEX virtual volume, EMC does not recommend the migration of a single virtual machine from one site to another since it breaks the paradigm discussed in the earlier paragraphs. A partial migration of the virtual machines hosted on a datastore can cause unnecessary disruption to the service in case of a network partition. For example, after the successful migration of the virtual machine IOM02 shown in Figure 68 and Figure 69, if there is a network partition the rules in effect on the devices hosted on the datastore suspend I/Os at the site on which the migrated virtual machine is executing. The suspension of I/Os results in abrupt disruption of the services provided by IOM02. To prevent such an untoward event, EMC recommends migrating all of the virtual machines hosted on a datastore followed by a change in the rules in effect for the device hosting the impacted datastore. The new rules should ensure that the I/Os to the device continue at the site to which the migration occurred.

Changing configuration of non-replicated VPLEX Metro volumes As mentioned in the previous paragraphs, EMC VPLEX Metro does not restrict the configuration of the virtual volume exported by the cluster. A VPLEX Metro configuration can export a combination of unreplicated and replicated virtual volumes. Business requirements normally dictate the type of virtual volume that has

Using VMware vSphere with EMC VPLEX

73

to be configured. However, if the business requirements change, the configuration of the virtual volume on which the virtual machines are hosted can be changed nondisruptively to a replicated virtual volume and presented to multiple VMware clusters at different physical locations for concurrent access. Figure 70 shows the datastore Conversion_Datastore that is currently available only at a cluster hosted at a single site (in this case Site A). Therefore, the virtual machines contained in this datastore cannot be nondisruptively migrated to the second site available in the VPLEX Metro configuration 8 unless remote access is enabled for the device on which the datastore, Conversion_Datastore, has been created, or the configuration of the VPLEX device is converted to a distributed device with copies of the data at both sites.

Figure 70. VMware datastore available at a single site in a Metro-Plex configuration Figure 71 shows the configuration of the virtual volume on which the datastore is located. It can be seen from the figure that the virtual volume contains a single device available at the same site. If the changing business requirement requires the datastore to be replicated and made available at both locations, the configuration can be easily changed as long as sufficient physical storage is available at the second site that currently does not contain a copy of the data.

8

Technologies such as storage vMotion can be used to migrate the virtual machine to a VPLEX Metro virtual volume that is replicated and available at both sites, and thus enable the capability to migrate the virtual machine non-disruptively between sites. However, this approach adds unnecessary complexity to the process. Nonetheless this process can be leveraged for transporting virtual machines that cannot tolerate the overhead of synchronous replication.

Using VMware vSphere with EMC VPLEX

74

Figure 71. Detailed information of a non-replicated Metro-Plex virtual volume The process to convert a non-replicated device encapsulated in a virtual volume so that it is replicated to the second site and presented to the VMware cluster at the second site is presented below. The process involves four steps: 1. Create a device at the site on which the copy of the data needs to reside. The process to create a device, shown in Figure 72, is independent of the host operating system and was discussed in the section “Provisioning VPLEX storage to VMware environments.”

Figure 72. Creating a device on EMC VPLEX using the GUI

Using VMware vSphere with EMC VPLEX

75

2. The next step is the addition of the newly created device as a mirror to the existing device that needs the geographical protection. This is shown in Figure 73, and just like the previous step is independent of the host operating system utilizing the virtual volumes created using the devices.

Figure 73. Changing the protection type of a RAID 0 VPLEX device to distributed RAID 1 3. Create or change the LUN masking on the EMC VPLEX Metro to enable the VMware ESX hosts attached to the nodes at the second site to access the virtual volume containing the replicated devices. Figure 74 shows the results after the execution of the process.

Using VMware vSphere with EMC VPLEX

76

Figure 74. Creating a view to expose the VPLEX virtual volume at the second site 4. The newly exported VPLEX virtual volume that contains replicated devices needs to be discovered on the VMware cluster at the second site. This process is the same as adding any SCSI device to a VMware cluster. Figure 75 shows the replicated datastore is now available on both VMware clusters at Site A and Site B after the rescan of the SCSI bus.

Figure 75. Viewing VMware ESX hosts that have access to a datastore

Virtualized vCenter Server on VPLEX Metro VMware supports virtualized instances of vCenter Server version 4.0 or later. Running the vCenter Server and associated components in a virtual machine provides customers with great flexibility and convenience since the benefits of a virtual data center can be leveraged for all components in a VMware deployment. However, in an EMC VPLEX Metro environment, a careless deployment of a vCenter Server running in

Using VMware vSphere with EMC VPLEX

77

a virtual machine can expose interesting challenges in case of a site failure. This is especially true if the vCenter Server is used to manage VMware environments also deployed on the same EMC VPLEX Metro cluster. As discussed in previous paragraphs, in case of a site failure or network partition between the sites, EMC VPLEX automatically suspends all of the I/Os at one site. The site at which the I/Os are suspended is determined through a set of rules that is active when the event occurs. This behavior can increase the RTO in case of a site failure and the VMware vCenter Server is located on an EMC VPLEX distributed volume that is replicated to both sites. The issue can be best elucidated through the use of an example. Consider a VMware environment deployment in which vCenter Server and SQL Server are running on separate virtual machines. However, the two virtual machines are hosted on a replicated EMC VPLEX device, D, between two sites, A and B. In this example, let us assume that the vCenter Server and SQL Server are executing at Site A. The best practices recommendation would therefore dictate that the I/Os to device D be suspended at Site B in case of link or site failure. This recommendation allows the virtual machines hosting the vSphere management applications to continue running at Site A in case of network partition 9. However, if a disruptive event causes all service at Site A to be lost, the VMware environment becomes unmanageable since the instance of device D at Site B would be in a suspended state unless VPLEX witness technology is deployed. To recover from this, a number of corrective actions listed below would have to be performed: 1. The I/Os to device D at site B have to be resumed in case VPLEX Witness technology is not used. This can be done through the VPLEX management interface. 2. Once the I/Os to the device D has been resumed, the vSphere Client should be pointed to one of the ESX hosts at Site B that has access to the datastore hosted on device D. 3. The virtual machine hosting the vCenter Server and SQL Server instance has to be registered using the vSphere Client. 4. After the virtual machines are registered, the SQL Server should be started first. 5. Once the SQL Server is fully functional, vCenter Server should be started. These steps would restore a fully operational VMware management environment at Site B in case of a failure at Site A. The example above clearly shows that hosting a vCenter Server on a replicated VPLEX Metro device can impose additional complexity in the environment in case of a site failure. There are two possible techniques that can be used to mitigate this: 9

It is important to note that in case of a network partition, the virtual machines executing at Site B on devices that have rules to suspend I/O at site A in case of a link failure continue to run uninterrupted. However, since vCenter Server located at Site A has no network connectivity to the servers at Site B, the VMware ESX hosts environment at Site B cannot be managed. This includes unavailability of advanced functionality such as DRS and VMotion.

Using VMware vSphere with EMC VPLEX

78



vCenter Server and SQL Server should be hosted on non-replicated EMC VPLEX devices. VMware Heartbeat can be used to transparently replicate the vCenter data between the sites and provide a recovery mechanism in case of site failure. This solution allows the vCenter Server to automatically fail over to the surviving site with minimal to no additional intervention. Readers should consult the VMware vCenter Server Heartbeat documentation for further information.



vCenter Server and SQL Server can be located at a third and independent site (for example, the site at which the VPLEX Witness machine is located) that is not impacted by the failure of the site hosting the VMware ESX hosts. This solution allows the VMware management services to be available even during network partition that disrupts communication between the sites hosting the EMC VPLEX Metro.

Customers should decide on the most appropriate solution for their environment after evaluating the advantages and disadvantages of each.

Conclusion EMC VPLEX running the EMC GeoSynchrony operating system is an enterprise-class SAN-based federation technology that aggregates and manages pools of Fibre Channel attached storage arrays that can be either collocated in a single data center or multiple data centers that are geographically separated by MAN distances. Furthermore, with a unique scale-up and scale-out architecture, EMC VPLEX’s advanced data caching and distributed cache coherency provide workload resiliency, automatic sharing, and balancing and failover of storage domains, and enables both local and remote data access with predictable service levels. A VMware vSphere data center backed by the capabilities of EMC VPLEX provides improved performance, scalability, and flexibility. In addition, the capability of EMC VPLEX to provide nondisruptive, heterogeneous data movement and volume management functionality within synchronous distances enables customers to offer nimble and cost-effective cloud services spanning multiple physical locations.

References The following includes more information on VPLEX and can be found on EMC.com and Powerlink. •

Implementation and Planning Best Practices for EMC VPLEX Technical Notes



EMC VPLEX Metro Witness – Technology and High Availability



EMC VPLEX 5.0 Product Guide



Implementation and Planning Best Practices for EMC VPLEX



EMC VPLEX 5.0 Architecture Guide

Using VMware vSphere with EMC VPLEX

79

The following includes more information on EMC with VMware products and can be found on EMC.com and Powerlink: •

Using VMware vSphere with EMC Symmetrix Storage white paper



Using EMC Symmetrix Storage in VMware vSphere Environments TechBook



VSI for VMware vSphere: Storage Viewer Product Guide



EMC PowerPath/VE for VMware vSphere Version 5.4 and Service Pack Installation and Administration Guide (Powerlink only)

The following can be found on the VMware website: •

VMware vSphere Online Library



vCenter Server Heartbeat Reference Guide

Using VMware vSphere with EMC VPLEX

80