Page 1 of 2

Split Brain - what now?

Posted: Mon Jun 10, 2019 5:59 am
by denkteich
Hi all,

while I am testing VSAN Free I forced a split brain.
Now the SyncState is stuck in 3 - not synchronized for several hours.

How can this be solved?
How can i reSync?

cheers
.d

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 4:21 pm
by Boris (staff)
First of all, if both nodes are stuck in the non-synchronized state, you will never get into the split-brain state. With split-brain, both sides would stay synchronized.
The steps you need to perform now are described in https://knowledgebase.starwindsoftware. ... -blackout/
Let us know if you need any additional information.

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 4:33 pm
by denkteich
Ok.

How can I do the steps in Powershell as I only have the free version?

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 5:35 pm
by Boris (staff)
You can have a look at the PowerShell sample scripts SyncHADevice.ps1 and SyncHADeviceAdvanced.ps1 and customize them as necessary.
Both of them can help you start synchronizing the disks, either by marking the current node's disk as synchronized or the partner's one.

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 6:03 pm
by denkteich
both scripts come to the point of synchronization with partner and end up with:

Exception calling "ExecuteCommand" with "3" argument(s): "Request to STARWINDVSAN1.SH.LOCAL ( 127.0.0.1 ) : 3261
-
control 0x000000EE5472BE40 -RestorePartnerNode:"iqn.2008-08.com.starwindsoftware:172.16.16.182-vmsan02"
-
200 Failed: connection with partner node is invalid.. "


any suggestion?

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 6:34 pm
by Boris (staff)
The first suggestion is checking networking between the hosts.

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 6:37 pm
by denkteich
first thing i did.
network is running fine.

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 7:21 pm
by denkteich
I ran the second part of the advanced script.

now I have the status:
Server: 10.0.0.181 - HAImage1
SyncStatus: synchronized
SyncPercent: 0 %
-----------
Server: 10.0.0.182 - HAImage1
SyncStatus: not synchronized
SyncPercent: 0 %
-----------

no change on this, no sync.

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 7:29 pm
by Boris (staff)
You should have StarWind Management Console available in the read-only mode. Do you see sync running there? Currently, you marked one of the devices as synchronized and it's already available for client connections. The other one should be reported as synchronizing.

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 7:33 pm
by denkteich
no change.
1st node synchronized 0%
2nd node not synchronized 0%

2nd node doesn't start synchronizing.
any way to force this?

Re: Split Brain - what now?

Posted: Mon Jun 10, 2019 8:01 pm
by Boris (staff)
Try running a customized version of SyncHADevice.ps1 that would force sync start for that device.

Re: Split Brain - what now?

Posted: Tue Jun 11, 2019 6:52 am
by denkteich
What is the command to force it?

is there any complete documentation of the powershell command, incl. object references, all methods, all properties?

Re: Split Brain - what now?

Posted: Tue Jun 11, 2019 7:03 am
by denkteich
I discovered one thing:
on the first node the device is named: iqn.2008-08.com.starwindsoftware:-vmsan01
on the 2nd node: iqn.2008-08.com.starwindsoftware:172.16.16.182-vmsan02

shouldn't be the IP included in the first node as well?

Re: Split Brain - what now?

Posted: Tue Jun 11, 2019 3:10 pm
by denkteich
i'm one step further.

after restarting the 2nd node synchronized.
but now i have this:

Server: 10.0.0.181 - HAImage1
SyncStatus: synchronized
SyncPercent: 0 %
-----------
Server: 10.0.0.182 - HAImage1
SyncStatus: synchronized
SyncPercent: 100 %
-----------

how do I get the 1st node to 100%?

Re: Split Brain - what now?

Posted: Tue Jun 11, 2019 6:06 pm
by Boris (staff)
Once the devices are synchronized, i.e. report their status to be 1, you are no longer interested in sync percentage. One node will report it as 100%, the other will show 0%. This is expected. The main thing would be confirming the status is 1, i.e. synchronized. So, your setup is fine at the moment.