Wednesday, May 27, 2009

Moving ZFS root filesystem from one computer to another

A few weeks ago, I decided to move the drives in my OpenSolaris (NexentaCP 1.0.1) box from one computer into another computer with slightly different hardware. The old machine used an Intel ICH6 SATA controller; the new box uses an inexpensive SIL3124-based four-port PCI controller card.

GRUB seemed to load fine, and was able to find and load the kernel (unix) and boot_archive files. However, Nexenta didn't boot: the kernel flashed an error message and immediately rebooted the system. After I added the -kv kernel debug option in GRUB, I was able to read the error message:
panic: cannot mount root path /ramdisk:a
After a great deal of effort, I finally figured out the solution. The problem was that the zpool.cache file inside of boot_archive was telling the Solaris kernel to look for the root filesystem at the wrong place. Booting from the Nexenta CD in rescue mode, mounting the root filesystem, and executing bootadm update-archive -R wasn't enough, because the zpool.cache file was never refreshed.

The solution, kind reader: if you move your ZFS root filesystem from one box to another, boot a Solaris "rescue" cd (I just use the NexentaCP 1.0.1 install CD), import your root filesystem, and manually freshen the zpool.cache file. Then update your boot archive using bootadm. On Nexenta, the root filesystem is labeled <<syspool/rootfs-nmu-XXX>>, where XXX corresponds to an apt-cloned version of the root filesystem. Here are the steps that I used, once I booted into the Nexenta installer and pressed the F3 key for the shell:
# mkdir /tmp/mnt

# zpool import -f syspool

# mount -F zfs syspool/rootfs-nmu-002 /tmp/mnt

# cp /etc/zfs/zpool.cache /tmp/mnt/etc/zfs/zpool.cache

# bootadm update-archive -R /tmp/mnt

Creating boot_archive for /tmp/mnt
updating /tmp/mnt/platform/i86pc/boot_archive

# umount /tmp/mnt

# reboot
... and... voila! system works again.

I've also noticed that NexentaCP 2 doesn't seem to want to boot if I have a SIL3124 card installed in the machine; this looks like a more recent bug in the latest OpenSolaris; however, NexentaCP 1.0.1 works just fine with the card.