HA Network Failover Problem

jeffhamm · Tue Sep 20, 2011 9:56 pm

We have a 2 node HA set running the latest build, and have two clustered HyperV hosts attached to the storage using CSVs. I have MPIO setup on the HyperV hosts. When I stop or start the StarWind Service on one of the nodes, the Virtual Machines keep running without any interruption in service.

But where we are having issues is when we simulate a complete network failure on one of the two StarWind nodes. If I disable all the network interfaces on one of the two nodes via a batch script (netsh disable...), the virtual machine freezes, and the LUNs completely disappear from both HyperV nodes. I then have to reboot all 4 boxes to get things running again. Obviously not a good situation.

I have tested and am able to replicate the above every time. Where would be a good place to start troubleshooting this issue?

Thanks,
Jeff

Tue Sep 20, 2011 10:03 pm

You're turning OFF all channels so putting down heartbeat as well. So node you're leaving alone has "slave" token and puts itself down to avoid split brain issue. You can toggle this behaviour but it's not recommended as you'll face SB sooner or later.

jeffhamm · Wed Sep 21, 2011 5:48 pm

So by default, if the heartbeat goes down for any reason (nic down, one node blue screens, etc ) the whole SAN goes offline?

Thu Sep 22, 2011 9:00 am

NO! By default if all links between HA nodes will go down (ALL means multiple synchronization channels and multiple heartbeat channels as well) node holding "slave" token will turn itself OFF to avoid split brain issue. With properly configured cluster (heartbeat routed thru initiator side subnetwork) you have ZERO chances to see both nodes down.

P.S. The only way to avoid such an issue completely is going to multiple HA nodes. More then two. Then we'll have a voting quorum. And we'll represent such a solution quite soon. So stay tuned

jeffhamm wrote:So by default, if the heartbeat goes down for any reason (nic down, one node blue screens, etc ) the whole SAN goes offline?

jeffhamm · Thu Sep 22, 2011 1:36 pm

But would not all the heartbeat networks go down if the StarWind node had a Blue Screen of Death? If the node that had the BSOD is the one holding the "Primary" token for all LUNs, does not the entire SAN still go down at that point?

rchisholm · Thu Sep 22, 2011 2:09 pm

Will the additional member of the voting quorum have to be a storage node? It would be great if it could just provide a quorum. In my situation, if it has to be a 3rd storage node, it increases my costs greatly. With 100's of TB's, the cost of the drives, controllers, servers, rack space, power, and cooling makes a big difference for a 3rd server.

anton (staff) wrote:NO! By default if all links between HA nodes will go down (ALL means multiple synchronization channels and multiple heartbeat channels as well) node holding "slave" token will turn itself OFF to avoid split brain issue. With properly configured cluster (heartbeat routed thru initiator side subnetwork) you have ZERO chances to see both nodes down.

P.S. The only way to avoid such an issue completely is going to multiple HA nodes. More then two. Then we'll have a voting quorum. And we'll represent such a solution quite soon. So stay tuned

jeffhamm wrote:So by default, if the heartbeat goes down for any reason (nic down, one node blue screens, etc ) the whole SAN goes offline?

jeffhamm · Thu Sep 22, 2011 5:55 pm

Anton - I think I can live with the Split Brain scenario for now. How do you change the default setting to allow the Slave to stay online?

Thanks,
Jeff

Thu Sep 22, 2011 8:28 pm

I think you cannot but whatever it's your data. Please drop a message to support@starwindsoftware.com so guys could help you. I don't want to publish "bad" advices on public

jeffhamm wrote:Anton - I think I can live with the Split Brain scenario for now. How do you change the default setting to allow the Slave to stay online?

Thanks,
Jeff

jeffhamm · Thu Sep 22, 2011 8:51 pm

I totally get not wanting to hand out "bad advise", but let me explain my idea:

- Set the StarWind Service to Manual instead of Auto on both nodes
- If my primary node goes down hard, the slave continues to run and service requests to virtual machines
- When my primary comes back online, StarWind service does not, so I can avoid data corruptions issues.

Does this make sense or is it crazy?

Thu Sep 22, 2011 9:30 pm

No. But you've messed whole thing up. "No network connection between nodes" and "One node went down" are different things. We do distinguish them and process differently. Your setup is SECOND (nodes going down) and you're talking about FIRST.

jeffhamm · Thu Sep 22, 2011 9:49 pm

OK - I get it that split brain is bad and will stop talking about that

What I'm trying to simulate is a situation where one of the two nodes goes down hard. I thought I could do this by just disabling all the network connections on the primary node. I'm guessing what you are trying to tell me is that this is a bad test?

Would a better test be for me to just push the reset button on the primary node? And if I do reset the primary node, is the expected behavior that the slave node will continue to service requests from virtual machines, or that the slave node will stop servicing requests at that point to avoid split brain?

Sorry to be such a pain - we're close to going into production with our Hyper-V cluster, and we need to make sure we have all the fail over scenarios accounted for and procedures for dealing with them in place if (or when) they occur.

Thanks!
Jeff

Thu Sep 22, 2011 9:58 pm

Yes it's a bad test. Turning node OFF and disabling all of its connections are two different things.

If node is dead other one would pick up it's work. Continue process requests.

There's no Master and Slave. There only Master and Slave token to process split brain issue. In all other things nodes are equal.

P.S. It does not mean you don't have something broken. So *DO* experiment with turning nodes on and off and resyncing everything BEFORE putting the whole thing to production. That's wise indeed.

jeffhamm wrote:OK - I get it that split brain is bad and will stop talking about that

What I'm trying to simulate is a situation where one of the two nodes goes down hard. I thought I could do this by just disabling all the network connections on the primary node. I'm guessing what you are trying to tell me is that this is a bad test?

Would a better test be for me to just push the reset button on the primary node? And if I do reset the primary node, is the expected behavior that the slave node will continue to service requests from virtual machines, or that the slave node will stop servicing requests at that point to avoid split brain?

Sorry to be such a pain - we're close to going into production with our Hyper-V cluster, and we need to make sure we have all the fail over scenarios accounted for and procedures for dealing with them in place if (or when) they occur.

Thanks!
Jeff

hixont · Fri Sep 23, 2011 12:19 am

rchisholm wrote:Will the additional member of the voting quorum have to be a storage node? It would be great if it could just provide a quorum. In my situation, if it has to be a 3rd storage node, it increases my costs greatly. With 100's of TB's, the cost of the drives, controllers, servers, rack space, power, and cooling makes a big difference for a 3rd server.

anton (staff) wrote:P.S. The only way to avoid such an issue completely is going to multiple HA nodes. More then two. Then we'll have a voting quorum. And we'll represent such a solution quite soon. So stay tuned

I would like to echo this concern. For SQL database servers that are mirrored (functionally what the SAN servers are doing) I only need a witness server that doesn't have to be a fully kitted out production SQL server. I can see reasons why I would want a fully functional three legged HA SAN configuration (offsite replication/failover for instance), but I can see equal validity in just having a witness server in place to act as a quorum voter. Would it be possible to have both options available? I haven't got the budgets to absorb another fully provisioned SAN server.

Fri Sep 23, 2011 6:56 am

1) We'll do support all of the listed scenarios. So you're not going to be forced to pick up working way.

2) What hypervisor do you run at this moment?

hixont wrote:
rchisholm wrote:Will the additional member of the voting quorum have to be a storage node? It would be great if it could just provide a quorum. In my situation, if it has to be a 3rd storage node, it increases my costs greatly. With 100's of TB's, the cost of the drives, controllers, servers, rack space, power, and cooling makes a big difference for a 3rd server.

anton (staff) wrote:P.S. The only way to avoid such an issue completely is going to multiple HA nodes. More then two. Then we'll have a voting quorum. And we'll represent such a solution quite soon. So stay tuned
I would like to echo this concern. For SQL database servers that are mirrored (functionally what the SAN servers are doing) I only need a witness server that doesn't have to be a fully kitted out production SQL server. I can see reasons why I would want a fully functional three legged HA SAN configuration (offsite replication/failover for instance), but I can see equal validity in just having a witness server in place to act as a quorum voter. Would it be possible to have both options available? I haven't got the budgets to absorb another fully provisioned SAN server.

hixont · Fri Sep 23, 2011 4:32 pm

anton (staff) wrote:1) We'll do support all of the listed scenarios. So you're not going to be forced to pick up working way.

Thanks.

anton (staff) wrote:2) What hypervisor do you run at this moment?

I am a Hyper-V (Windows 2008 R2) shop and bounce between Hyper-V Manager, Failover Cluster Manager and SCVMM 2008 R2 as my management consoles.