Page 1 of 1
Cluster 2 nodes down after power outage
Posted: Tue Nov 03, 2020 12:12 pm
by CedricT
Hi all!
We had a situation we didn't expect on our cluster VSAN. We use 2 nodes hyper-v windows server 2016 and VSAN Free.
So the classic power outage by someone and our 2 servers are down. They are switched on and nothing happens. We had to mark as sync to make it work. Normaly auto-sync should have worked right? (may be with a little bit of split brain like expected but that's not the issue there).
I checked logs on both nodes and i saw some stuff like LOGIN_REJECT. Is there something we did wrong?
Sincerely
Re: Cluster 2 nodes down after power outage
Posted: Tue Nov 03, 2020 4:40 pm
by yaroslav (staff)
Welcome to StarWind Forum. Here is what needs normally to be done
https://knowledgebase.starwindsoftware. ... -blackout/.
Is sync running right now? Are you getting these events while sync is running or after all is synchronized? Would love to have some logs. Use this tool
https://knowledgebase.starwindsoftware. ... collector/ and share logs via Google Drive, Sharepoint, etc.
If that happened before HA devices are synchronized, this can be iSCSI initiator being unable to connect to the target (normal if you have HA device our of sync).
Re: Cluster 2 nodes down after power outage
Posted: Wed Nov 04, 2020 11:20 am
by CedricT
Hi,
Thank you for your reply.
We are aware of the documentation. The sync is running. We used a script to mark as sync the volume and it worked.
But still that's not what we expected.
"After all nodes of the HA cluster were down, StarWind is by default able to determine which node holds the most recent data and starts the synchronization process automatically if all nodes are online." => We were expecting that and when we did some tests we had this result. The sync will start over and over but at least the volumes were UP...
"In case when StarWind services can not determine which node contains the most recent data they block all incoming connections to prevent data corruption until one of the HA partners is marked as Synchronized." => that's the problem...
registerSession: Client initiator iqn.1991-05.com.microsoft:****************.int is trying to register a session within the 'iqn.2008-08.com.starwindsoftware:**************-csv1' target... (sessId = 0x14, initiatorNameIsid = iqn.1991-05.com.microsoft:*******,400001370003)
11/2 14:16:24.860396 ad4 HA: HASyncNode::registerSession: Unable to register the new client session. The node is not active!
11/2 14:16:24.860407 ad4 HA: HASyncNode::registerSession: Return code 21.
11/2 14:16:24.860423 ad4 Tgt: *** iScsiTarget::openSession: iqn.2008-08.com.starwindsoftware:********: can't register session. The device 'HAImage1' is not ready.
11/2 14:16:24.860435 ad4 T[14,1]: ***iScsiTask::startLoginPhase: *ERROR* Login request: device open failed.
11/2 14:16:24.860513 20d0 C[14], IN_LOGIN: iScsiConnection::doTransition: Event - LOGIN_REJECT.
11/2 14:16:24.860640 ad4 debug: *** Swn_SocketRecv: WSARecv() failed with error 10054 (0x2746)!
11/2 14:16:24.860663 ad4 Srv: *** SwSocket::Recv: Swn_SocketRecv() failed with error 10054 (0x2746)!
Re: Cluster 2 nodes down after power outage
Posted: Wed Nov 04, 2020 12:03 pm
by yaroslav (staff)
Normally, StarWind VSAN is able to determine the side to synchronize from, but sometimes, it is not possible. I am referring to situations when mechanisms to prevent data corruption/loss are triggered.
Thank for sharing the log but I need the entire log. Please use this tool
https://knowledgebase.starwindsoftware. ... collector/