The last time I put together a home lab I took advantage of VMWare’s free software program. This program is a really good business model. Medium to large virtualization deployments are simply rare. There aren’t that many companies that really need these gigantic deployments. So how does one get engineers familiar with VMWare’s offerings? They give away a free version of their software that is feature restricted.
This free version used to be just a few years old, for example if VMware was on version 6 as production, 5.5 would be available for use under a free license. Nowadays they appear to be just restricting access to certain features and limiting VM deployments. This lets curious engineers use their software and gain familiarity. When one of these engineers is put into a position to make a decision or recommendation for how to deploy a new virtualization cluster at a company, they will go with what they know, VMWare. This is a win-win for both sides.
Throughout this series I have referred to my storage server. This is slightly inaccurate. Saying it this way implies that this is a bare metal server deployment. It is not. It is a VMWare ESXi 6.0 deployment, with only one real VM. I have added a small VM from time to time, but essentially, the only VM is a big one that takes all the processors and 80% of the memory for the storage VM. Since I never really used it for anything else, I started thinking of it as the storage server.
I deployed the storage server this way because I wanted to be more future proof than normal. When I “upgraded” from Napp-it to FreeNAS, I did not need to really risk anything. I just deployed a new VM, gave it control of the HBA via passthrough, and voila, the ZFS pool was essentially migrated. If this had failed, I would simply have re-routed the HBA back to the old Napp-It VM and it would have (very probably) just booted up like normal. I didn’t need to do this, so I don’t have confirmation of this, but if the HBA can move to a new VM it stands to reason it would have moved back to the old one. This is one of the largest advantages that virtualization, via the hypervisor, has made possible.
I even remember at one point wanting to convert all of my servers and workstations into a deploy like this. Dedicated VM with basically all PCI hardware. The only thing that prevented this was that the Video Cards didn’t simply work like this (more on this later when I discuss getting the VR VM working). I can even have multiple VMs that get access to all of the hardware and decide which one to boot, no dealing with multi-boot loaders like Grub or Bootcamp. Anyone, like myself, who has ever dealt with those knows how picky they can be. Plus it limits the hardware nicely. There’s no risk of accidentally wiping the hard drive or partition configured for the use of the other operating systems installed. Virtualization has more than proved its value.
For this project, I wanted to re-examine the base decision of which hypervisor to use. This is something I try to do in my professional life as well. Many times engineering decisions aren’t actually made, they simply move forward because of institutional momentum. I like to make these decisions explicit. Far too often companies and engineers find themselves maintaining an application, deployment, or infrastructure that they can barely keep working because they never actually examined or re-examined the base decisions about it. If I want to proceed with VMWare, I want it to be because I examined the options and decided to continue using it. Now VMWare definitely has an advantage in this situation, because I am more familiar with it, but I still want to compare with its competitors.
There are about 5 main hypervisors I am considering, VirtualBox, VMWare ESXi, KVM/qemu, Xen, and Hyper-V. This isn’t to say there are not more, but these are the big ones I think are worth considering. I am not planning to talk through all of the various features and comparisons between the hypervisors, many have done that better than I can. There are a plethora of comparisons out there: VirtualBox vs VMWare [1], VirtualBox vs Hyper-V [2], VMWare vs KVM [3], and even a general overview of a lot of options is on wikipedia [4].
The truth about this selection is, for basically every important feature one could pick any virtualization software and likely make it work. There are tables showing how large the VMs can be all over the place [5]. They all will meet my basic needs about processors, RAM, storage space, and networking. Therefore, I am going to work through only the fringe cases that apply to this project, and consider non-technical factors.
The largest fringe case for my project is complete support for PCI passthrough, or rather the largest support base for it possible. PCI passthrough is what enables a VM to natively control a PCI device. It can even break apart a device with multiple controllers between VMs. This is now a core virtualization feature. The only risk is in how it is installed, for instance VirtualBox needs to be installed on Linux to make PCI passthrough work. I do not consider any of these to be particularly advantageous just from a technology support point of view. The only feature I would highlight here is that VMWare, Xen, and Hyper-V support clustering natively. It isn’t even that VirtualBox can’t, it can be set up with a management console. But a native cluster may be more interesting from a growth and future planning opportunity.
Non-technical factors are likely to be the decision maker here. All of these platforms are mature. They will all serve my basic purposes. I want to take a brief digression into something I have dealt with in my professional career.
When I am deciding what technologies to use in designing a new application, I have a wealth of options, but let’s say I am deciding between a newer language like Go, and an older one like Python. In my opinion it would be inaccurate to say one is strictly “better” than the other. Go was designed with parallelism in mind, but much of the same features can be achieved in Python, just differently. In addition Python has quite a few other libraries that have been built up for it. For instance if my project were a data science one, Python would be a natural choice.
However, let’s take a look at beyond just basic technologies involved. Stack Overflow does a great yearly survey about technology use [6]. A full 44% of developers say they have used python, only 8% have used Go. This tells me that if I need to hire more developers on my project, I will have a much easier time finding people familiar with Python than with Go.
In addition the median Python developer is paid $77,000 – $111,000 a year [7], and the average Go developer is paid ~$136,000 a year [8]. This makes intuitive sense, if I need to hire a Go developer, and Go developers are less common, they should command a higher salary. So a large portion of my decision should be based on engineer availability and cost, not just technology.
I ran into this exact problem in my last position. Before I had joined, the decision was made to build the application in Elixir. Now I enjoy Elixir, it brings a lot of fun to my daily programming tasks. It has distinct threading advantages over its competitors. But from a hiring point of view? It was darn near impossible to find developers that didn’t command very large salaries. Something like greater than $200,000 a year, and hiring contractors was even more expensive. This made that decision difficult to work with. If I had been around, I likely would have pushed for something a bit less cutting edge, such as Python, or probably Go, which is similar but has a greater reach.
Setting that aside. One can easily see how hard the trap of cutting edge can be. Engineers like working with the newest and best. However, as an engineering leader, it is not always the wisest decision to bind the organization to this. I like to optimize for maintainability, both in coding and in technologies. This helps me if, say, my best engineer gets an offer I can’t compete with, I can more easily find engineers to replace them and work on the project.
I don’t mean to suggest that I will never decide on the latest and greatest, I tend to think programming languages are more about learning syntax, providing one has a solid computer science foundation. So if I have a large pool of talented engineers, I’m a lot less concerned about starting a project with a newer technology, they would be easier to move around if it grows or shrinks.
I also think salary can be mitigated a bit by a larger pool of talented engineers. Essentially trading time to learn the programming language and familiarity with the newer technology for money. Strong developers will enjoy the learning. However, as I’m sure my colleagues know, this isn’t a price that is always willing to be paid.
I don’t want to go too much further down this rabbit hole, talking about how technical debt builds up, or how investing in the engineer, even independent of a current goal, is always a good idea. These are all important in an overall organization, but hopefully my overall point here is clear. Non-technical issues are often just as, if not more important than, the actual technology being explored.
So, that relates to my original question like this: what technologies am I likely to encounter in my career? Essentially, if I am to keep familiarity with a technology, which is the best to keep up? The easiest way to look at that is what the market share of virtualization technologies looks like.
There are reports that one can purchase for this information [9], I am not willing to pay that price for a home project. I found a couple of other options [10] [11](dated, but still helps) [12]. These confirm what I know from my days as the engineering leader at a web hosting company. VMWare is the 800-lb gorilla in the room. We had numerous requests for private managed vSphere clusters, almost nothing for anything else. I wanted to double check where VMWare stood.
Note, I don’t really want to get into AWS, or Google Cloud, or Azure for a home lab. These are all more about management consoles for data center level operations. It’s not that they are not virtualization technologies, they are, it’s more that it is not simple for me to deploy a private AWS cloud for myself. For example, AWS is built on Xen [13], Google Cloud on KVM (qemu) [14], and Azure on a custom version of Hyper-V [15]. I am not aware of a method for deploying a Xen server and then hosting a private AWS Console web application. In addition, there does not appear to be a great deal of consensus among the large tech companies which virtualization technology is preferred. So I will default to the general market share here.
VMWare is probably the largest market share, and the one that I have a slightly better chance of encountering. I should also note, the concepts that I use will inevitably find counterparts among all of the various virtualization technologies.
With this in mind, I narrowed myself down to Xen, Hyper-V and VMWare. The truth is that KVM/qemu is fully featured, but like most things Linux, takes a great deal of effort to get working correctly. This is the same reason I rarely recommend to my non-engineering friends to use Linux, at least, unless I want to be on the hook for tech support for years.
I do not mind this, but from a maintenance point of view, it is easy to decide to try one of the more easily managed hypervisors. VirtualBox seems to have dominated the running of multiple operating systems on one’s workstation, much like parallels on mac. However, it’s features as a bare metal hypervisor are slightly lacking. Hyper-V is out because it does not support FreeBSD [16]. This is what FreeNAS uses, so I would prefer to keep that. In addition VMWare and Xen support a very wide range of operating systems. However, determining if Xen supports windows 10 is difficult, it isn’t listed on the project [17], Wikipedia page [18], or guides [19]. The Citrix version clearly does [20]. I believe on balance, Xen probably does support windows 10, but I do not trust when so many sources leave important operating system support less clear.
I will continue with VMWare. There is solid reason to believe they control a substantial portion of the virtualization market, and that it will support all of my expected use cases. There is reason to suspect other solutions will be more difficult to work worth, if they can support everything at all. I am not certain this is the best choice. I think a good argument can be made that Xen familiarity, especially with AWS’s current market dominance, may be more useful. However, I do not believe that this should be such a detriment. I would also be remiss if I didn’t say that I think my previous familiarity also played a role here. I am familiar with, and trust my ability to work through ESXI issues, less so on Xen.
References
[1] https://www.nakivo.com/blog/vmware-vs-virtual-box-comprehensive-comparison/
[2] https://www.nakivo.com/blog/hyper-v-virtualbox-one-choose-infrastructure/
[3] https://www.redhat.com/en/topics/virtualization/kvm-vs-vmware-comparison
[4] https://en.wikipedia.org/wiki/Comparison_of_platform_virtualization_software
[6] https://insights.stackoverflow.com/survey/2020#technology-programming-scripting-and-markup-languages
[10] https://www.datanyze.com/market-share/virtualization–193
[11] https://www.smartprofile.io/analytics-papers/vmware-far-largest-server-virtualisation-market/
[12] https://www.storagereview.com/news/vmware-takes-42-q1-hci-software-market-share
[16] https://en.wikipedia.org/wiki/Hyper-V
[17] https://wiki.xenproject.org/wiki/Category:HowTo
[18] https://en.wikipedia.org/wiki/Xen
[19] https://www.virtualmin.com/documentation/cloudmin/windows
[20] https://docs.citrix.com/en-us/xenserver/7-1/system-requirements/guest-os-support.html