Justin Azoff

Hi! Random things are here :-)

Xen live migration without shared storage

The problem

The Xen documentation on live migration states:

Currently, there is no support for providing automatic remote access to filesystems stored on local disk when a domain is migrated. Administrators should choose an appropriate storage solution (i.e. SAN, NAS, etc.) to ensure that domain filesystems are also available on their destination node. GNBD is a good method for exporting a volume from one machine to another. iSCSI can do a similar job, but is more complex to set up.

This does not mean that it is impossible though. Live migration is a more efficient migration, and migration can be seen as a save on one node, and a restore on another. Normally, if you save a VM on one machine, and try to restore it on another machine, it will fail when it is unable to read its filesystems. But what would happen if you coppied the filesystem to the other node between the save and restore? If done right, it works pretty well.

The solution?

The solution is simple:

  • Save running image
  • Sync disks
  • copy image to other node, restore

This can be somewhat sped up by syncing the disks twice:

  • Sync disks
  • Save running image
  • Sync disks - only having to save any changes in the last few seconds
  • copy image to other node, restore

Syncronizing block devices

File backed

If you are using plain files as vbds, you can sync the disks using rsync.

Raw devices

If you are using raw devices, rsync can not be used. I wrote a small utility called blocksync which can syncronize 2 block devices over the network. In my testing it was easily able to max out the network on an initial sync, and max out the disk read speed on a resync.

$ blocksync.py /dev/xen/vm-root 1.2.3.4

Will sync /dev/xen/vm-root onto 1.2.3.4. The device should already exist on the destination and be the same size.

Solaris ZFS

If you are using ZFS, it should be possible to use zfs send to sync the block devices before migration. This would give an almost instantaneous sync time.

Automation

A simple script xen_migrate.sh and its helper xen_vbds.py will migrate a domain to another host. File and raw vbds are supported. ZFS send support is not yet implemented.

Example migration

#migrating a 1G / + 128M swap over the network
#physical machines are 350mhz with 64M of ram,
#total downtime is about 3 minutes

xen1:~# time ./migrate.sh test 192.168.1.2
+ '[' 2 -ne 2 ']'
+ DOMID=test
+ DSTHOST=192.168.1.2
++ xen_vbds.py test
+ FILES=/dev/xen/test-root
/dev/xen/test-swap
+ main
+ check_running
+ xm list test
Name              Id  Mem(MB)  CPU  State  Time(s)  Console
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 942, diff: 82, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ save_image
+ xm save test test.dump
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 1019, diff: 5, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ copy_image
+ scp test.dump 192.168.1.2:
test.dump                                       100%   16MB   3.2MB/s   00:05
+ restore_image
+ ssh 192.168.1.2 'xm restore test.dump && rm test.dump'
(domain
    (id 89)
    [domain info stuff cut out]
)
+ rm test.dump

real    6m6.272s
user    1m29.610s
sys     0m30.930s