StarWind’s automatic failover and failback

JLaay · Sun Jan 17, 2010 11:04 am

Hi Aitor,

Well I think I'm to blame for getting this specific info into the forum. I asked for it.

And Aitor: Wow lots of info.

I have to let things sink in.

Some upfront, and maybe silly, questions

1. From info I got: running Starwind as VM in Hyper-V is officially supported. Not VMware. Correct?
2. 4) So I've got two identical nodes ...
Is this RAID-1 the SW raid-1?
3. 'since then I have also received an unreleased (so far) patch from support that ...., but does seem to have helped a little bit'
In what way did it help? Synchronization errors I bet ... but can SW explain what they altered to explain this?
4. If your logging maybe is not sufficient for SW to find out the problem in your specific situation. Does SW have other means, e.g. tool, they use for this purpose?

Greetz Jaap

PS Sorry I hit you

Above all that it didn't help

Aitor_Ibarra · Sun Jan 17, 2010 3:54 pm

Hi Jaap,

1) Don't know about vmware, but I'm sure it will work fine. I remember a press release announcing support for running inside a hyper-v vm - regardless, it wasn't a good idea until hyper-v r2, as the original hyper-v didn't support jumbo frames inside VMs. I'm not having issues with 4.2 in a VM with a production workload, so I'd be surprised if this issue was due to virtualisation.

2) RAID-1 is handled by the RAID controllers, so it's hardware raid; windows sees one disk when really there are two. For the starwind 5 HA tests, I'm using a 300GB RAID-1 volume on a couple of 7200RPM 2.5" disks. NOt very high performance, but I'm not doing performance tests yet!

3) It seems to have helped in that I've not had a data corruption issue yet, although I've not pushed it as hard as 5.0; however I very quickly got to the stage where the starwind service is crashing before I can sync. The patch wasn't specifically to address my problem so I'm not sure if it has really helped or not... Also with 5.2 being able to force a full sync in either direction, providing I've got one good node it's more recoverable.

4) Starwind writes out a memory dump when it crashes, support have these dump files as well as the logs, although only for my initial tests with 5.0.

Also clarification on my setup, although starwind 5 (and 4.2) instances are on hyper-v vms, the physical hosts are not part of a failover cluster. My "hyper-v nodes" are, but they are different physical machines.

cheers,

Aitor

Constantin (staff) · Mon Feb 22, 2010 10:37 am

Do you have any additional questions?

TomHelp4IT · Fri Feb 26, 2010 2:44 am

Thanks all, I've been away from this forum for a month or so and you've pretty much covered all the questions I had started to ask about HA/sync a while back (and some more I hadn't thought of) and didn't get very far with. Auto-failback is an important feature for us as we support a few client installs where we can't be checking the SAN servers all the time, which is linked to the other issue on my wish list - decent alerting! An email when a node is out of sync isn't much to ask for, but I seem to remember Anton saying they were adding it soon.

I had achieved both the split brain and data corruption scenarios Aitor describes in testing, for the former I decided teaming two NICs for the sync links would provide enough redundancy. The latter I only achieved by really messing around with the servers and networking so I didn't feel it counted

, I'm concerned you've had it due to a simple server crash. Having said that we did experience corruption quite a few times with one Linux VM when having problems with the HA on a pair of ESXi hosts, but my Linux expert came to the conclusion that it was the result of the way its ext3 filesystem was failing to properly handle storage disconnections, other VMs on the same hosts with ReiserFS and NTFS filesystems recovered automatically.

Overall we're willing to compromise to a certain extent, at present we don't recommend Starwind where the client has serious uptime requirements so as long as the risk of losing SAN datastores is reasonably minimal its not going to be the end of the world if a restore from backup is required.

Btw, our senior engineer has a 100% Basque name but is 75% Sunderland.

Constantin (staff) · Fri Feb 26, 2010 9:14 am

Automated failover will be implemented in StarWind 5.5, which will be released in Q1. While notification services will be released in future versions.
Also note, that we don`t recommend to run StarWind in VMs on ESX, because ESX can drop session, because of high load, in future will fix it.