Network virtualization abstract network functions to be defined in software. Virtual machines can be set up with virtual Network Interface Cards (NICs), which connect to virtual switches, creating a virtual network. The physical switches on the host machine can connect to these virtual switches, which are logical representations of physical switches, that can carry traffic of different types and manage all network traffic regarding virtual machines, including virtual switches, maintaining the virtual networks and connection between guests and the outside network. Hypervisors pick up requests coming through the guests’ virtual NIC drivers to the network emulator, and it reaches the outside network through a physical NIC card. A response takes the path in reverse, back to the application in the virtual machine.
A virtual network can be used to create an enclosed environment between guests on a host. This is done by connecting virtual machines to the same virtual switch and not allowing connection from physical NICs to that specific virtual switch. Not only will the communication, which takes place in memory, be high-speed, but it will also make the data more secure and keep its integrity. When applications on different virtual machines often communicate, they should be placed on the same host for the shortest network latency.
When connecting to storage devices through both physical and virtual networks, each virtual machine should dedicate virtual and physical switches to storage traffic only. Some hypervisors have settings to control which traffic types or virtual machines to prioritize. Multipathing, having more than one path to and from storage, will work as a protection against failure and keep up the availability. As well as enhance the performance by load balancing data across the paths using NIC teaming, a group of physical network adapters connected to the same virtual switch.
In a virtual cluster, multiple physical machines are connected through a network with a shared resource pool. For better allocation, the resource pool can be split into smaller sections, called child resource pools as illustrated in figure 1 below. In case of failure, clustering provides quick recovery by redirecting network traffic for the affected application to another device. Performance algorithms ensure there are enough resources to run the application on other hosts in the cluster before restarting the affected virtual machines on another device. Automated load balancing makes it possible to migrate underperforming, or in the case of a failing host, virtual machines automatically. When a physical device needs to go offline, the guests will be migrated to other hosts in the cluster, avoiding downtime for the applications. If a cluster becomes unbalanced and it is optimized to use the existing resources effectively, the solution is to add more physical hosts to the cluster.
Live migration is moving a virtual machine while it is running. To make it work without compromising data integrity or interrupting user experience, there need to be dedicated switches for this bandwidth intense request. Live migration consists of three phases to minimize unavailability. First is the pre-copy phase, copying the running virtual machine’s memory to the new physical device. Changes in memory pages after the copying are saved in a record, which gets copied in the next phase, called stop-and-copy. The second stage puts the virtual machine in a suspended state for a short time, for copying and migration to be finalized in the third step. The post-copy phase sends the necessary information to unsuspend the virtual machine to the new physical device.
Affinity & Anti-affinity
It is possible to configure whether different virtual machines should run on the same physical host device or not by using affinity and anti-affinity rules. VM-affinity can guarantee that certain guests run on the same host, while Anti-affinity ensures that specific guests are not permitted on the same host. The redundant counterpart of a virtual machine is not wanted on the same physical host device, but guests that constantly communicate are wanted on the same host.
Fault Tolerance (FT) is similar to clustering, as a machine affected by failure will start up on another device. Enabling fault tolerance requires a second physical device where all duplicating of the primary virtual machine’s state are saved continuously. Both primary and secondary machines are in lockstep and track each other’s heartbeat. If the host of the primary machine experience failure, the secondary machine takes over and becomes the primary, which triggers the process of creating a new secondary machine on a third physical device.
Comer, D. 2021. The Cloud Computing Book
Portnoy, M. 2016. Virtualization Essentials. 2nd ed.
VMware: VM-VM Affinity Rules