Xenserver DR Best Practices for HA

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
mogulbumm
Posts: 6
Joined: Tue Dec 15, 2009 8:21 pm

Mon Jul 19, 2010 7:07 pm

Please publish something on Xenserver best practices for HA. This seems to be a "broken" part of the software. If a server goes down, we need to know the steps to take for it to come back up.

Using (2) 4Tb iSCSI units:

Examples:

1. Storage server 1 needs to be rebooted for maintenance. HA properly kicks over to Storage server 2, however multipath now shows only 1 of 1 path active, even after storage server 1 comes back online. If you try to do a sync at that point (or have the setup so that it is automatically set to sync) it takes FOREVER to the point it renders the system unusable (1% after almost 2 hours????). Hence the only way to get it back onine and in HA mode is to force remove the devices. What is the regular process for something like this to re-establish the multi-path environment and get HA back up and functional?

2. Needed to power either both storage servers or Xenservers down for maintenance. Servers come back online and iSCSI virtual disk storage shows it is not connected at all. No commands will allow it to reconnect properly as it would show it could no longer log in (iSCSI login failure it claimed). Possibly have to start over and recreate the entire setup. Again, what is the process here?

Apparently, anytime we actually kick into HA mode, we have to start all over again with the console commands to get back up and running. There is to way to automatically reconnect back to the original or even resync the devices in a reasonable timeframe to establish HA (it seems that when you resync, the entire storage array is offline until the sync completes???)

Please let me know what processes have been developed so that we can use this in a true HA environment in Xen.
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Tue Jul 20, 2010 10:55 am

Mogulbumm, we're working on a detailed document for managing HA with Xen, meanwhile the only document which is available is a StarWind HA + Xen guide which you have probably used during the configuration process. The problem is actually caused not by our "broken" software part but by Xen's specific work with multipathed storage. In the XenServer 5.6 Citrix has already implemented a feature which allows you to eliminate downtime of the whole configuration.
After HA node failure you go to the console and type this:
iscsiadm -m node -T iqn.2008-08.com.starwindsoftware:starwind1.starwindsan.com-xenha3 -p 192.168.1.80 --logout (put in the IQN of the device that recently failed, it will clear the login entry in Xen, if you try to re-connect to it without issuing this command it will not let you as it thinks you are still logged in the target)
Then you simply issue the logon command for the recently failed target:
iscsiadm -m node -T iqn.2008-08.com.starwindsoftware:starwind1.starwindsan.com-xenha3 -p 192.168.1.80 -l
We are working with Citrix to make customers lives easier, so it is just a matter of time before you will be able to add an HA target just the same way you add a simple iSCSI datastore.

For a maintenance mode you should turn off all the VM's which are stationed on the iSCSI datastore, then disconnect the datastore and log both target sessions off like described above. Then you should go to the SW management console and ensure that there is only one iSCSI connection which represents sync channel. The next optional step is to synchronize the targets if they report an unzynchronized state. After this you can just remove the targets and turn the servers off for any kind of maintenance or relocation. After turning everything back on you simply recreate the device using the existing image files and secify "do not synchronize virtual disks" initialization method.Then you log the targets back in on the Xen hosts.

One more thing: sometimes when logging in on the Xen host you can notice that StarWind reports an "unsynchronized" state for the partner on one of the nodes while the partner node reports itself as synchronized. In this case you should start a fast sync on the node which is disconnected from the client (1 iSCSI session instead of 2 or more, depending on the config)
Max Kolomyeytsev
StarWind Software
mogulbumm
Posts: 6
Joined: Tue Dec 15, 2009 8:21 pm

Tue Jul 20, 2010 12:52 pm

Thanks Max. I wasn't implying the software was broken, yet just no documented procedure on this.

So to clarify:

In 5.5: To perform any type of routine server maintenance on the storage servers:

- Power off all virtual machines
- Disconnect the iSCSI datastore
- Log off targets
- Remove targets
- Perform maintenance
- Recofigure HA array using existing image files (do not sync)
- Reconnect to Xen
- Power virtual machines back on

As long as I know the process that is step 1.

Also, does this mean that you are fully compatible with 5.6? Does anything change wtih the original setup there or can I follow the same setup guide for 5.5?

Looking forward to the Xen document since that is sorely missing right now. I don't mind the steps, but rather there was no document outside the setup guide that had this before.

Thank you for your reply!
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Wed Jul 21, 2010 9:35 am

I'm not throwing you up, just wanted to say that we're not broken :)
All the steps you've described are right, just ensure that when you logg off the targets they are really logged off (minor changes in one of the ha targets will cause a desynchronization, I do not think you want to spend additional time for synchronizing the targets)

Our compatibility with Xen is officially stated here: http://www.starwindsoftware.com/news/36
and on the Citrix website.
Max Kolomyeytsev
StarWind Software
Post Reply