2 node HA split brain

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

thirdbird
Posts: 9
Joined: Tue Feb 14, 2017 8:21 am

Sat Dec 02, 2017 5:12 pm

I'd like to follow up this thread a bit.

I've done some testing after getting warm with the powershell scripts and setting up a 2-node homelab with only laptops, so using a single NIC for everything. That means that when I pull a plug, they become entirely separated. I set up a VM sharing the VSAN storage and it created an immediate split brain situation where as they did not even try to resync the latest changes done on a shared VM folder.

However, If I just stopped the service on Node2, so that the NIC was still alive, it triggered an exception and a resync when it came back online. The positive thing though, is that the recent changes I had done while VM was running on Node1, were synced to Node2 as it should, not the other way around even if it says "from partner" in the sync script (yes node2 was the partner). Is this because of some shared quorum technique?

I'm rethinking using 2 nodes, if they loose all contact it's split brain for sure. 2-node setup simply needs to stay connected at all times to be practical and that takes away the HA aspect of it for me. I'm having problems anyway with running a failover cluster without DC on 2 small servers, so I'm not sure what I'm gonna do at all.
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Sat Dec 02, 2017 10:39 pm

I figured in most environments it would be unusual for the network structure to even make it possible for clients to be able to access the two servers independently if the network between the servers was down, so split-brain would be unlikely to be a problem.

And Windows Server 2016 failover clusters should be able to boot with the DC as a guest, I think.

Having said that, I just had a total cluster crash, apparently due to the StarWind trial license expiring. I sure hope the StarWnd iSCSI connections don't depend on an external DC for DNS, because both of my DCs are guests the failover cluster.

-- Ken
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
Michael (staff)
Staff
Posts: 317
Joined: Thu Jul 21, 2016 10:16 am

Thu Dec 07, 2017 3:10 pm

Hello Ken,
The best way to avoid split-brain is just connecting StarWind servers directly using separate NICs other than Synchronization.
As far as I know, StarWind support engineer helped you to restore Production and cluster shared volume appeared even when DCs were inside the cluster (Windows server 2016), but I would still recommend you keep DCs out of a cluster.
sure hope the StarWnd iSCSI connections don't depend on an external DC for DNS

Of course, iSCSI connections do not depend on DC and DNS.

Thirdbird,
As I wrote before, you will get split-brain if synchronization and heartbeat channels disappear simultaneously - both devices will stay synchronized and you will not see any changes in devices state. In your test, after StarWind service restart on node2, StarWind always will do sync to node2 by design. If changes have been done only on node1, you will not have any data corruption.

You will have a chance to test next release with Node Majority Failover Strategy very soon :wink:
Post Reply