Virtually the best blog on the web!
Posts tagged LiteGreen
LiteGreen: Saving Energy in Networked Desktops Using Virtualization
Jun 21st
Update: LiteGreen wins best paper award!
I will be presenting LiteGreen this week at the USENIX Annual Technical Conference. LiteGreen saves desktop energy by migrating the idle desktops (running in VMs) to a central server. More details on the Project page. Slides and video of my presentation are available from talks page

Understanding Live Migration of Virtual Machines
Jun 13th
Introduction
In virtualization community, live migration of virtual machines is pretty much considered a “default” mature feature in any hypervisor product. All major vendors like VMware, XenSource/Citrix, and Microsoft have products that support live migration. I consider it a major success for virtualizaton research community, since the first academic paper on it was published not so long ago in NSDI 2005. VMware’s vmotion technology (I think) predates this paper, but the technical details were largely unknown.
This post is partly inspired by some of the questions I received during a recent LiteGreen talk. Non-virtualization folks seem to misunderstand some of the aspects of live migration, so in this post, I will explain live migration and some “gotchas” in using it.
Alright, What is live migration? Migration of a virtual machine is simply moving the VM running on a physical machine (let’s call it source node) to another physical machine (let’s call it target node). The trick is to do this, while the VM is running on the source node, and without disrupting any active network connections even after the VM is moved to the target node. It is considered “live”, since the original VM is running, while the migration is in progress. Huge benefit of doing the live migration is the very small downtime in the order of milli seconds.
How to achieve live migration
To move a VM from the source node to the target node, we need to consider moving its cpu state, memory content, storage content, and network connections.
- Migrating CPU state has been extensively researched in the context of process migration. See Berkeley Lab Checkpoint/Restart (BLCR) for implementation details.
- Migrating memory content is a bit tricky, considering that the VM on the source node is still running and making modifications to the memory state. The idea is to do iterative copying of the memory contents, and send only the “delta” changes to the target node. There is a point, when only a small “delta” memory that needs to be copied. At this stage, the VM on the source node is paused, the delta memory is copied, and VM is resumed on the target node. The brief pause is what causes the downtime.
- Migrating storage content is similar to memory, but will require a lot more time, and the migration may take on the order of minutes. It may not be easy to guarantee small mill second downtime with storage migration. All current commercial products side-step this problem, by using centralized storage (e.g. NFS, iSCSI, Fibre Channel based SAN) that hosts VM images. Storage content doesn’t have to be migrated, if both source and target node are connected to the centralized storage. There are a few research solutions (1, 2, 3) that try to address this problem.
- Migrating network connections is simple, if you assume that all of the nodes are in the same IP subnet. When the VM is migrated to the target node, the VM simply has to send an ARP broadcast saying that the IP address has moved to a new physical (MAC address) location. Since this happens at the connection between Layer 2 and Layer 3 of network stack, transport layer is transparent to this change, and TCP connections survive the migration. As a result, applications see no disruption in network connections. Clearly, this approach doesn’t work, if the VMs have to cross subnets. There are many ways (1, 2) to solve this problem, but all of them have huge performance implications
The success and popularity of live migration lies in the fact that it has very small downtime, and some of the limitations I mentioned above are inherent to achieving that goal.
Benefits of live migration
One of the primary use-cases for live migration is for resource management in cloud computing. For example, cloud computing providers like Amazon EC2 have thousands of VMs running in their data centers. To save energy, cost and for load balancing, they can move VMs using live migration, without disrupting their customer applications running in the VMs. How to do this efficiently (or optimally) is a big research question, and some solutions (1) are available.
Frequently asked questions about live migraiton
Before I conclude, some answers to frequently asked questions regarding live migration.
- What is the usual VM downtime? The original Xen paper reported downtimes as small as 60ms for specific workloads. This is dependent on the application running in the VM, and network heavy applications may see disruptions, even if the downtime is small.
- What is the total migration time? This is where most people get confused, since the total time it takes to perform live migration is different from the downtime. It is usually in the order of a few seconds (ranging from 10-120 seconds). This time depends on how heavily the VM on the source node is modifying its memory content.
- How is live migration different from suspend/resume? Suspend/resume is similar to migration, since you can suspend a VM to a disk (on a centralized storage), and then resume the VM on a different physical machine. This works, but it is not “live”. Clearly, active network connections cannot be resumed in this model.
- How is live migration different from migrating to the cloud? This is an unfortunate source of confusion, since migration may mean different things to different people. When people are talking about migrating to the cloud, they usually are talking about moving your computation and data to a cloud like Amazon EC2 or Windows Azure. This is not live, and it may not really be VM migration.