I have been hearing a lot of interest from my clients lately about stretched vSphere clusters. I can certainly see the appeal from a simplicity standpoint. At least on the surface. Let’s take a look at the perceived benefits, risks, and the reality of stretched vSphere clusters today.
First, let’s define what I mean by a stretched vSphere cluster. I am talking about a vSphere (HA / DRS) cluster where some hosts exist in one physical datacenter and some hosts exist in another physical datacenter. These datacenters can be geographically separated or even on the same campus. Some of the challenges will be the same regardless of the geographic location.
To keep things simple, let’s look at a scenario where the cluster is stretched across two different datacenters on the same campus. This is a scenario that I see attempted quite often.
This cluster is stretched across two datacenters. For this example let’s assume that each datacenter has an IP-based storage array that is accessible to all the hosts in the cluster and the link between the two datacenters is Layer 2. This means that all of the hosts in the cluster are Layer 2 adjacent. At first glance, this configuration may be desirable because of its perceived elegance and simplicity. Let’s take a look at the perceived functionality.
- If either datacenter has a failure, the VM’s should be restarted on the other datacenter’s hosts via High Availability (HA).
- No need for manual intervention or something like Site Recovery Manager
Unfortunately, perceived functionality and actual functionality differ in this scenario. Let’s take a look at an HA failover scenario from a storage perspective first.
- If virtual machines failed over from hosts in one datacenter to hosts in another datacenter, the storage will still be accessed from the originating datacenter.
- This will cause storage that is not local to the datacenter to be accessed by hosts that are local to the datacenter as shown in the diagram below.
This situation is not ideal in most cases. Especially if the datacenter is completely isolated. Then the storage cannot be accessed anyway. Let’s take a look at what happens when one datacenter loses communication with the other datacenter, but not with the datacenter’s local hosts. This is depicted in the diagram below.
- Prior to vSphere 5.0, if the link between the datacenters went down or some other communication disruption happened at this location in the network, each set of hosts would think that the others were down. This is a problem because each datacenter would attempt to bring the other datacenter’s virtual machines up. This is known as a split-brain scenario.
- As of vSphere 5.0, each datacenter would create its own Network Partition from an HA perspective and proceed to operate as two independent clusters (although with some limitations) until connectivity was restored between the datacenters.
- However, this scenario is still not ideal due to the storage access.
So what can be done? Well, beyond VM to Host affinity rules, if the sites are truly to be active / standby (with the standby site perhaps running lower priority VM’s), the cluster should be split into two different clusters. Perhaps even different vCenter instances (one for each site) if Site Recovery Manager (SRM) will be used to automate the failover process. If there is a use case for a single cluster, then external technology needs to be used. Specifically, the storage access problem can be addressed by using a technology like VPlex from EMC. In short, VPlex allows one to have a distributed (across two datacenters) virtual volume that can be used for a datastore in the vSphere cluster. This is depicted in the diagram below.
A detailed explanation of VPlex is beyond the scope of this post. At a high level, the distributed volume can be accessed by all the hosts in the stretched cluster. VPlex is capable of keeping track of which virtual machines should be running on the local storage that backs the distributed virtual volume. In the case of a complete site failure, VPlex can determine that the virtual machines should be restarted on the underlying storage that is local to the other datacenter’s hosts.
Technology is bringing us closer to location aware clusters. However, we are not quite there yet for a number of use cases as external equipment and functionality tradeoffs need to be considered. If you have the technology and can live with the functionality tradeoffs, then stretched clusters may work for your infrastructure. The simple design choice for many continues to be separate clusters.