Split Brain - what now?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

denkteich
Posts: 32
Joined: Fri Jun 07, 2019 7:52 am

Mon Jun 10, 2019 5:59 am

Hi all,

while I am testing VSAN Free I forced a split brain.
Now the SyncState is stuck in 3 - not synchronized for several hours.

How can this be solved?
How can i reSync?

cheers
.d
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jun 10, 2019 4:21 pm

First of all, if both nodes are stuck in the non-synchronized state, you will never get into the split-brain state. With split-brain, both sides would stay synchronized.
The steps you need to perform now are described in https://knowledgebase.starwindsoftware. ... -blackout/
Let us know if you need any additional information.
denkteich
Posts: 32
Joined: Fri Jun 07, 2019 7:52 am

Mon Jun 10, 2019 4:33 pm

Ok.

How can I do the steps in Powershell as I only have the free version?
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jun 10, 2019 5:35 pm

You can have a look at the PowerShell sample scripts SyncHADevice.ps1 and SyncHADeviceAdvanced.ps1 and customize them as necessary.
Both of them can help you start synchronizing the disks, either by marking the current node's disk as synchronized or the partner's one.
denkteich
Posts: 32
Joined: Fri Jun 07, 2019 7:52 am

Mon Jun 10, 2019 6:03 pm

both scripts come to the point of synchronization with partner and end up with:

Exception calling "ExecuteCommand" with "3" argument(s): "Request to STARWINDVSAN1.SH.LOCAL ( 127.0.0.1 ) : 3261
-
control 0x000000EE5472BE40 -RestorePartnerNode:"iqn.2008-08.com.starwindsoftware:172.16.16.182-vmsan02"
-
200 Failed: connection with partner node is invalid.. "


any suggestion?
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jun 10, 2019 6:34 pm

The first suggestion is checking networking between the hosts.
denkteich
Posts: 32
Joined: Fri Jun 07, 2019 7:52 am

Mon Jun 10, 2019 6:37 pm

first thing i did.
network is running fine.
denkteich
Posts: 32
Joined: Fri Jun 07, 2019 7:52 am

Mon Jun 10, 2019 7:21 pm

I ran the second part of the advanced script.

now I have the status:
Server: 10.0.0.181 - HAImage1
SyncStatus: synchronized
SyncPercent: 0 %
-----------
Server: 10.0.0.182 - HAImage1
SyncStatus: not synchronized
SyncPercent: 0 %
-----------

no change on this, no sync.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jun 10, 2019 7:29 pm

You should have StarWind Management Console available in the read-only mode. Do you see sync running there? Currently, you marked one of the devices as synchronized and it's already available for client connections. The other one should be reported as synchronizing.
denkteich
Posts: 32
Joined: Fri Jun 07, 2019 7:52 am

Mon Jun 10, 2019 7:33 pm

no change.
1st node synchronized 0%
2nd node not synchronized 0%

2nd node doesn't start synchronizing.
any way to force this?
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jun 10, 2019 8:01 pm

Try running a customized version of SyncHADevice.ps1 that would force sync start for that device.
denkteich
Posts: 32
Joined: Fri Jun 07, 2019 7:52 am

Tue Jun 11, 2019 6:52 am

What is the command to force it?

is there any complete documentation of the powershell command, incl. object references, all methods, all properties?
denkteich
Posts: 32
Joined: Fri Jun 07, 2019 7:52 am

Tue Jun 11, 2019 7:03 am

I discovered one thing:
on the first node the device is named: iqn.2008-08.com.starwindsoftware:-vmsan01
on the 2nd node: iqn.2008-08.com.starwindsoftware:172.16.16.182-vmsan02

shouldn't be the IP included in the first node as well?
denkteich
Posts: 32
Joined: Fri Jun 07, 2019 7:52 am

Tue Jun 11, 2019 3:10 pm

i'm one step further.

after restarting the 2nd node synchronized.
but now i have this:

Server: 10.0.0.181 - HAImage1
SyncStatus: synchronized
SyncPercent: 0 %
-----------
Server: 10.0.0.182 - HAImage1
SyncStatus: synchronized
SyncPercent: 100 %
-----------

how do I get the 1st node to 100%?
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Tue Jun 11, 2019 6:06 pm

Once the devices are synchronized, i.e. report their status to be 1, you are no longer interested in sync percentage. One node will report it as 100%, the other will show 0%. This is expected. The main thing would be confirming the status is 1, i.e. synchronized. So, your setup is fine at the moment.
Post Reply