vMotion I/O Errors with HP NC522 10gb NIC

We recently spun up a new VMware ESXi 4.1 cluster at $DAYJOB, running on some nice new HP DL380 G7 servers. We’re using the onboard 1gb NICs for the management network and an HP NC522SFP dual-port 10gb NIC for production, vMotion, and IP storage. Everything went smoothly until we started testing vMotion between hosts. It would consistently fail at between 10% and 40% with an I/O error:

I/O Error

After praying to the Google deity for a while, we hit upon the following KB article: vMotion fails on ESX/ESXi 3.5 and 4.0 with some versions of nx_nic and unm_nic drivers.  The issue only seems to crop up if you have VLAN tagging enabled on the vSwitch to which the NIC is connected, and are using TCP segmentation offload (which is enabled by default).

The fix is to either create a new vmKernel interface for vMotion with TSO disabled (and without using VLAN tagging), or to upgrade the NIC driver in ESX/ESXi itself.  In our case, since this was a new environment, we decided to fix it for good and do the upgrade.  A quick download and a little vMA magic, and vMotion is now working flawlessly over 10gb.