Page 1 of 2

2 node HA split brain

Posted: Thu Aug 03, 2017 10:41 pm
by muhfugen
From what i've read on the forums (https://forums.starwindsoftware.com/vie ... f=5&t=3440), split brain can be an issue when using a 2 node HA configuration, when total network failure occurs between the two nodes. I was wondering if anything has been done to solve this issue in the past three years, such as being able to have a witness?

Re: 2 node HA split brain

Posted: Fri Aug 04, 2017 12:48 pm
by anton (staff)
a) StarWind has redundant heartbeat networks and dynamic witness (same way Windows Server 2012/2016 cluster works), it eliminates the need in an external witness.

b) You can run 3 nodes now.

c) You can have a witness with an upcoming R6 update (end of August 2017). You might want to play with a beta right away.

All of these a), b) and c) solve what you're afraid of.
muhfugen wrote:From what i've read on the forums (https://forums.starwindsoftware.com/vie ... f=5&t=3440), split brain can be an issue when using a 2 node HA configuration, when total network failure occurs between the two nodes. I was wondering if anything has been done to solve this issue in the past three years, such as being able to have a witness?

Re: 2 node HA split brain

Posted: Fri Aug 04, 2017 4:27 pm
by muhfugen
Thanks a lot Anton.

Re: 2 node HA split brain

Posted: Fri Aug 04, 2017 4:50 pm
by muhfugen
anton (staff) wrote:dynamic witness (same way Windows Server 2012/2016 cluster works)
Do you know where I could find more information about configuring a dynamic witness in VSAN? I cant seem to find much documentation from google beyond forum posts and Geo Clustering PDF which mentions it in passing.

Re: 2 node HA split brain

Posted: Sat Aug 05, 2017 9:52 am
by anton (staff)
That's beta functionality. Drop a line to anton AT starwind DOT com and I'll get you in touch with techies for preview builds and early documentation. Just mention in the subject what's it all about ;)
muhfugen wrote:
anton (staff) wrote:dynamic witness (same way Windows Server 2012/2016 cluster works)
Do you know where I could find more information about configuring a dynamic witness in VSAN? I cant seem to find much documentation from google beyond forum posts and Geo Clustering PDF which mentions it in passing.

Re: 2 node HA split brain

Posted: Wed Nov 22, 2017 4:26 pm
by wallewek
Let me see if I understand this correctly.

Are you saying that a StarWind Free HA 2-node VSAN is currently susceptible to split-brain failure -- having both hosts coming up and running independently -- and there's nothing we can currently do about it? I.e., there's no witness/quorum functionality at the VSAN level in StarWind Free?

And are you saying that this functionality does exist in the paid-license version?

Is there a document somewhere to which you would refer me to clarify this? Any recommendations on how to prevent it, beyond simple redundancy?

Thanks for any clarification you could provide.

-- Ken

Re: 2 node HA split brain

Posted: Wed Nov 22, 2017 4:50 pm
by anton (staff)
I never told anything like that! StarWind vSAN Free is 100% identical in terms of the functionality compared to commercial version, it's thick UI and various support plans making the difference.

https://www.starwindsoftware.com/whitep ... s-paid.pdf

If you want to play complete paranoid (one can always survive according to A. Grove) you can combine redundant heartbeat networks and external witness, I think we'll release it with our next update. You can apply for RC right now.
wallewek wrote:Let me see if I understand this correctly.

Are you saying that a StarWind Free HA 2-node VSAN is currently susceptible to split-brain failure -- having both hosts coming up and running independently -- and there's nothing we can currently do about it? I.e., there's no witness/quorum functionality at the VSAN level in StarWind Free?

And are you saying that this functionality does exist in the paid-license version?

Is there a document somewhere to which you would refer me to clarify this? Any recommendations on how to prevent it, beyond simple redundancy?

Thanks for any clarification you could provide.

-- Ken

Re: 2 node HA split brain

Posted: Thu Nov 23, 2017 4:44 pm
by wallewek
Thanks you Anton, pardon my misunderstanding.

But I'm still looking for a better understanding: HOW does StarWind prevent split-brain operation in a two-host HA environment?

For instance, what would happen if all network connectivity between the two physical hosts were suddenly lost, but the hosts were otherwise unaffected, and still reachable by other systems?

Is there, for example, some sort of status on each host that tells them which one is allowed to run independently, and the other not, unless a human intervenes? Like a "quorum stick" that is owned by one of them at any given time? And if the owning host has failed, the other still cannot start without human intervention?

If you could explain, or refer me to documentation, I would appreciate it.

You mentioned an external witness functionality that is due to be released. I would like to know more details about how that works, too.

Some background:
Years ago, we used a different two-host HA storage virtualization product called VM6 VMEX, whose vendor has gone out of business. There was a storm that caused abrupt power failure and partial network hardware failure. When power came back on, the cluster was unable to resume operation at all because of the lack of network functionality between the hosts, or external quorum, until we came on site to resolve things.

As a result of that incident, I've given a lot of thought to the question of two-host quorum: in principle, it doesn't take much to have somewhere to store a status flag, a lock, something giving one host quorum. I've even thought about using ARP cache, cloud-based storage or a symbolic DNS alias.

So I'm really interested in the details.

-- Ken

Re: 2 node HA split brain

Posted: Fri Nov 24, 2017 9:34 am
by Boris (staff)
wallewek,

Hope this Knowledge Base article can answer your question.
https://knowledgebase.starwindsoftware. ... planation/

Re: 2 node HA split brain

Posted: Fri Nov 24, 2017 3:45 pm
by wallewek
Thank you Boris,

I would say that KB article is incomplete, but it does appear to confirm one of the failure modes I described.
It says:
If data can`t be transferred through the synchronization channel StarWind checks the availability of the second node through the alternate network interface, and shuts down the secondary node in case of synchronization channel failure.
Which implies a primary/secondary or "quorum stick" (my term) approach.

Therefore, if the "primary" host in a 2-host HA cluster abruptly fails, the cluster as a whole will fail as well, because the heartbeat and sync will have stopped, and there will not have been an opportunity for the cluster software to automatically "fail over" to the other host, as would occur in a controlled shutdown.

Thus human intervention will be required for cluster recovery from abrupt failure of the primary host.

I presume that is what the beta witness functionality, described earlier in this thread, is intended to address.

Please provide some information on how that witness will work, and its infrastructure requirements,

-- Ken

Re: 2 node HA split brain

Posted: Mon Nov 27, 2017 4:58 pm
by Michael (staff)
Ken,
Let me explain a little bit how it works.
With heartbeat failover strategy, HA devices have assigned priority number. If synchronization channel is down, StarWind services will talk to each other via heartbeat and device with the highest priority number will be marked as not synchronized by design. If synchronization and heartbeat channels disappear simultaneously, both devices will stay synchronized which will lead to split-brain for sure. That is why we do recommend assigning more independent heartbeat channels during replica creation to make sure that one of the nodes will become not synchronized. As a summary, with heartbeat failover strategy, the storage cluster will continue working with only one StarWind node available.
With node majority failover strategy, HA devices have only synchronization channel and each of them should have a connection to Witness node which is a part of HA device but contains no data. In this scenario, the main requirement for keeping nodes operational is an active connection with more than a half of the HA device’s nodes. Nodes that can communicate with more than a half of the device's nodes (including themselves) remain operational. As a summary, with node majority failover strategy with 2 storage nodes and one Witness node, if one node does not see others, it will mark itself as not synchronized and will reject client connections.
Once Witness node feature is released, we will publish documentation about it. Please let us know if you have other questions.

Re: 2 node HA split brain

Posted: Tue Nov 28, 2017 2:43 am
by wallewek
Thank you very much Michael, that really helps.

One thing: I thought StarWind 3-node clusters already had some some sort of quorum function, does it not?

-- Ken

Re: 2 node HA split brain

Posted: Tue Nov 28, 2017 10:05 am
by Michael (staff)
In the current build, StarWind 3-node configuration has only heartbeat failover strategy.

Re: 2 node HA split brain

Posted: Tue Nov 28, 2017 9:08 pm
by wallewek
Thank you Michael, that's very helpful.

-- Ken

Re: 2 node HA split brain

Posted: Tue Nov 28, 2017 11:06 pm
by Michael (staff)
You are welcome :)
Please let us know if you have other questions.