Server virtualization is infectious. It is a technology that tends to take off in record pace in IT organizations that have adopted it as part of their infrastructure. It has been my experience that organizations fall into one of two broad categories when it comes to their virtualization initiatives. They either look at server virtualization as a “Strategic Initiative” or they use server virtualization as a “Tactical Tool.” Let’s explore these categories and then I’ll discuss some infrastructure options for a structured virtual infrastructure.
Server Virtualization as a “Tactical Tool”
I have seen this in many organizations. The IT group needed to test a new application or needed to spin up a new server quickly. What’s the quickest way to spin up a new server? Server virtualization, of course. So, here is how I see many infrastructures get started:
- IT department downloads the free vSphere Hypervisor
- IT department proceeds to click next until the hypervisor is installed
- IT department spins up a few virtual machines on the hypervisor
- “Life is good. That was easy wasn’t it?”
- “It’s so easy and cool that more demand creeps up for further virtual machines
- Pretty soon the IT department wants to host production workloads on the hypervisor
- “But wait? What about failover, live migration, etc. Don’t I need a SAN for that?”
- How “much” storage do I need?
- IT department calculates how much space they are using on their servers, or worse yet, how much disk space in total is available on all of their servers combined
- “Wow! I need a lot of space to host all of those servers”
- IT department buys large slow “shared disks” of some variety to satisfy the SAN requirement
- IT department sets up vCenter on a spare server
- IT department clicks next until a few hypervisors are installed and added to the new cluster complete with “shared storage”
- Now there is some equipment and software in place to host virtual machines
- IT department spins up new virtual machines until they are suddenly out of capacity or things are “just slow and error prone”
- Virtualization stalls because there is no more capacity and there is a lack of trust in the virtual infrastructure as it stands
- IT department starts purchasing physical servers again for “critical” applications
- “Now DR must be provided for those “critical” applications. How can we protect them?”
- “The easiest thing to do would be to leverage virtualization, but we’re out of capacity and the platform has been problematic”
- “What do we need to do to leverage virtualization on a larger scale in our infrastructure?”
It’s a vicious cycle and it is why I continue to see companies only 20-40% virtualized. It is great that server virtualization technology has been embraced. However, without proper planning and a structured approach to building and maintaining the virtual infrastructure, many organizations will continue to be only 20-40% virtualized. They are leaving the many benefits of server virtualization and even money on the table if they stall.
So, this series of posts will explore the alternative of server virtualization as a “Strategic Initiative”. This is the approach that I take with my clients at TBL to either build a structured virtual infrastructure from the ground up or remediate a “tactical tool” virtual infrastructure to the point that it becomes an effective platform to host the organizations infrastructure moving forward.
Physical Infrastructure Choices
There are many options when it comes to virtual infrastructure hardware. Before any hardware choices are made, a capacity planning engagement should occur. Notice that capacity planning was not mentioned at all in the “Server Virtualization as a Tactical Tool” scenario. Look at this infrastructure as if it is going to host all of your physical servers even if you will not start there. How else does one determine if the infrastructure purchased for a new virtual infrastructure is sufficient if capacity planning is not performed? I can’t count the number of times that I have heard the equivalent of the below phrase:
- “These host servers and storage should do since my physical servers don’t really do much.”
How do you know that your host servers don’t do much unless you have performed capacity planning? Is it a gut feeling? I have seen many gut feelings cause server virtualization stall. We need to examine the four “core” resources (CPU, RAM, DISK, and NETWORK) to determine not only our capacity but the level of performance needed. After a proper capacity planning engagement we can determine the “feeds and speeds” of our hardware. However, the hardware choice becomes about more than just raw specs in a structured virtual infrastructure. Let’s examine some options.
Traditional Rackmount Server Infrastructure
This is the standard virtual infrastructure that has been around for a while. With this approach, you take rackmount servers as hosts and provide shared storage via iSCSI, NFS, or Fibre Channel. A diagram of this approach can be seen below.
This infrastructure is well understood. However, the scalability is somewhat limited. Typically, a virtual infrastructure host will have eight to ten cables attached to it in a 1Gbe environment. This is due to the way that traffic should be separated in a virtual infrastructure. This is fine for a few hosts. As the infrastructure is scaled, the number of cables and ports required becomes problematic. I have seen environments where shortcuts were taken to provide enough ports by combining virtual infrastructure traffic types even though they should be separated. As more hosts are needed a better solution to scaling the infrastructure needs to be in place.
Converged Rackmount Server Infrastructure
This infrastructure consolidates the traditional 1GbE infrastructure into a 10GbE infrastructure by connecting to an FCoE or straight 10GbE switch. This allows more bandwidth and cuts down on the port count required as the infrastructure scales.
As this infrastructure is scaled, the number of cables and ports required is much more manageable. It must be noted that the cable infrastructure still scales linearly with the hosts. Port count can still be an issue in larger environments. Also, we really haven’t added anything new on the server management front in this design choice. Again, for smaller, relatively static environments this can work nicely. If the infrastructure needs to be able to scale quickly and efficiently, there are better options.
Converged Blade Infrastructure
Large scale ease of management, efficient scaling, and massive compute capacity can be achieved without the inherent cable / port count problems with a converged blade infrastructure. In the example below, a Cisco UCS B-Series converged blade infrastructure to achieve these benefits.
Let’s look at the components of this infrastructure model.
- The UCS 6100 series would be similar to the FCoE switches in the Converged Rackmount Infrastructure. I say similar because it is ideal to still have upstream SAN and LAN switches. In this scenario the 6100 pair act like a host (or multiple hosts) attached to the Fibre Channel Fabric. They accomplish this with an upstream switch that is NPIV capable.
- The blade chassis provide the backplane connectivity for your compute resources or blades. Each chassis can have up to (8) 10Gb FCoE ports for connectivity. The blades share that connectivity to the upstream 6100’s.
- The 6100’s then take that FCoE traffic and split it into Fibre Channel to connect to the upstream SAN Fabric and Ethernet to connect to the upstream LAN Fabric.
- Instead of calculating bandwidth / port counts at the server level as you would in a traditional rack mount scenario, you calculate bandwidth needs at the 6100 level.
- Less cabling, more scalability, easier management, smaller footprint.
With the up front investment in the 6100’s in this architecture, the solution scales out with only incremental cost very nicely. Also, the 6100’s are the single point of management using the UCS Manager in this infrastructure. The UCS abstracts unique information that would identify a server into a service profile. The types of data in the service profile may include items such as:
- Bios Settings
- WWN
- MAC Addresses
- Boot Order
- Firmware Revisions
This way, settings that would normally be configured after a server arrives can be pre-configured. When a new blade arrives you can simply slide the blade into the chassis, assign the service profile, boot it and it is ready to install an OS in minutes. If this OS is ESXi, then that only takes about 5 minutes to install as well.
With the Converged Blade Infrastructure we set up a foundation for ease of incremental scalability when the environment grows. Using this as the model infrastructure, the upcoming posts will examine the different components involved in more detail so that you can get a holistic view of the entire virtual infrastructure as a structure approach is taken to building this out.