Cluster 2 nodes down after power outage

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
CedricT
Posts: 36
Joined: Mon Apr 15, 2019 1:14 pm

Tue Nov 03, 2020 12:12 pm

Hi all!

We had a situation we didn't expect on our cluster VSAN. We use 2 nodes hyper-v windows server 2016 and VSAN Free.

So the classic power outage by someone and our 2 servers are down. They are switched on and nothing happens. We had to mark as sync to make it work. Normaly auto-sync should have worked right? (may be with a little bit of split brain like expected but that's not the issue there).

I checked logs on both nodes and i saw some stuff like LOGIN_REJECT. Is there something we did wrong?

Sincerely
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Tue Nov 03, 2020 4:40 pm

Welcome to StarWind Forum. Here is what needs normally to be done https://knowledgebase.starwindsoftware. ... -blackout/.
Is sync running right now? Are you getting these events while sync is running or after all is synchronized? Would love to have some logs. Use this tool https://knowledgebase.starwindsoftware. ... collector/ and share logs via Google Drive, Sharepoint, etc.

If that happened before HA devices are synchronized, this can be iSCSI initiator being unable to connect to the target (normal if you have HA device our of sync).
CedricT
Posts: 36
Joined: Mon Apr 15, 2019 1:14 pm

Wed Nov 04, 2020 11:20 am

Hi,

Thank you for your reply.

We are aware of the documentation. The sync is running. We used a script to mark as sync the volume and it worked.

But still that's not what we expected.

"After all nodes of the HA cluster were down, StarWind is by default able to determine which node holds the most recent data and starts the synchronization process automatically if all nodes are online." => We were expecting that and when we did some tests we had this result. The sync will start over and over but at least the volumes were UP...

"In case when StarWind services can not determine which node contains the most recent data they block all incoming connections to prevent data corruption until one of the HA partners is marked as Synchronized." => that's the problem...

registerSession: Client initiator iqn.1991-05.com.microsoft:****************.int is trying to register a session within the 'iqn.2008-08.com.starwindsoftware:**************-csv1' target... (sessId = 0x14, initiatorNameIsid = iqn.1991-05.com.microsoft:*******,400001370003)
11/2 14:16:24.860396 ad4 HA: HASyncNode::registerSession: Unable to register the new client session. The node is not active!
11/2 14:16:24.860407 ad4 HA: HASyncNode::registerSession: Return code 21.
11/2 14:16:24.860423 ad4 Tgt: *** iScsiTarget::openSession: iqn.2008-08.com.starwindsoftware:********: can't register session. The device 'HAImage1' is not ready.
11/2 14:16:24.860435 ad4 T[14,1]: ***iScsiTask::startLoginPhase: *ERROR* Login request: device open failed.
11/2 14:16:24.860513 20d0 C[14], IN_LOGIN: iScsiConnection::doTransition: Event - LOGIN_REJECT.
11/2 14:16:24.860640 ad4 debug: *** Swn_SocketRecv: WSARecv() failed with error 10054 (0x2746)!
11/2 14:16:24.860663 ad4 Srv: *** SwSocket::Recv: Swn_SocketRecv() failed with error 10054 (0x2746)!
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Wed Nov 04, 2020 12:03 pm

Normally, StarWind VSAN is able to determine the side to synchronize from, but sometimes, it is not possible. I am referring to situations when mechanisms to prevent data corruption/loss are triggered.
Thank for sharing the log but I need the entire log. Please use this tool https://knowledgebase.starwindsoftware. ... collector/
Post Reply