If you work within the IT realm, you have most likely heard of hyper converged infrastructure (HCI) by now. It has been gaining increased attention over the last few years, and 2019 has already shown a big push towards HCI as the preferred Virtual Infrastructure architecture.
HCI can sound very impressive—who doesn’t want deeper levels of abstraction and greater levels of automation in one single system with simplified management? Caution: if it sounds too good to be true, it most likely is.
We want to forewarn anyone contemplating the switch to HCI with VMware vSAN. There are things HCI advocates are not telling you about it and you could land in a situation where you end up spending top dollar for a subpar solution lacking the resiliency, availability and manageability of a traditional SAN and server VMware vSphere architecture.
Balancing the Nodes
For a hyperconverged infrastructure to deliver the convenience and functionality promised, the nodes have to be completely balanced. It cannot be storage or compute heavy. Henceforth, balance becomes increasingly difficult.
We’ve recently come across several organizations that have experienced catastrophic data loss with their VMware vSAN HCI due to their vendor under sizing the vSAN cluster(s) to be cost competitive with traditional Storage Area Network (SAN) proposals also under consideration at time of purchase.
This is becoming somewhat of an epidemic, so it’s extremely important that you understand the risks associated with an under-sized VMware vSAN cluster architecture when planning a data center refresh.
Failures to Tolerate
VMware vSAN has something called the Failures To Tolerate (FTT) feature to set data redundancy in a vSAN HCI cluster. The default setting is FTT= 1, which implies that the server node cluster is designed to tolerate a single node taken offline without any data loss.
A higher level of redundancy should be used to protect from multiple nodes failing concurrently, which ensures a higher degree of cluster reliability and uptime. However, this comes at the expense of maintaining multiple copies of data and thereby impacts the number of writes needed to complete one transaction.
Moreover, a larger cluster size is required for a higher degree of redundancy; the minimum number of cluster nodes required is 2 × FTT + 1, which is why many HCI vendors conveniently skip over the FTT discussion altogether and instead assume the default of FTT=1 to remain cost competitive with traditional SAN solution.
Choosing and Planning for HCI
To put it plainly, you will want to use FTT=3 with balanced nodes for any mission critical workloads, which translates to having no fewer than 7 nodes (preferably 8 nodes) in a VMware vSAN cluster. This is the only way to achieve acceptable protection from data loss provides the benefit of minimizing the performance impact to the cluster when hard drives and/or nodes fail, or you need to put a vSAN node in maintenance mode.
Lastly, in our humble opinion, the node count should be doubled if you are planning to stretch a vSAN cluster across two data centers using VMware vSphere Metro Storage Cluster, but at the very least it should be some number larger than 8 nodes. If not, the VMware vSAN cluster becomes a ticking time bomb as the server hardware ages and degrades over time.
We are fully aware that what we’ve outlined opposes what most HCI vendors are telling you about VMware vSAN, so we encourage you to do more research, talk to more experts, and give us a call if you have any questions or want to discuss whether or not a hyperconverged infrastructure is right for you. It will certainly save you money and countless headaches!