Split Brain issues, Scale Out File server

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
enzoescalante
Posts: 1
Joined: Wed Mar 03, 2021 6:08 pm

Fri Jul 30, 2021 2:48 pm

Hello. We have a cluster, 2 nodes and a witness disk for quorum. The nodes are 2 virtual machines Windows Server 2016, they are connected by 3 network interfaces: heartbeat, sync and cluster management.
The cluster have 2 disk, 1 CSV and witness disk. Both disks are connected by ISCSI
We had configured starwinds software following this guide:https://www.starwindsoftware.com/resour ... r-2012-r2/

The configuration is : Failover strategy: Heartbeat and Mode: Synchronous.
Lately we have experienced synchronization problems between nodes. If communication is interrupted between the nodes, one of them is marked as "not synchronized" and does not automatically resume synchronization. One workaround that starwinds support gave us was disconnecting and reconnecting the synchronization interface.

Last weekend, the cluster nodes lost communication with each other due to network problems for 5 minutes. The shared folders that I had in the cluster were not available from the client computers. There was a total failure of service.

From Starwinds management console all disk was marked as red, not sinchonized. From microsoft failover console, all disk were marked as offline and all cluster roles too..

Investigating the windows log (cluster.log) I found that the cluster entered at that moment in the split brain condition. I cannot understand why this happened. It is assumed that in addition to a hearbeat interface, the cluster has a witness disk connected by ISCSI which should give quorum so that the service will not fail and neither will both nodes enter a split brain condition.

Could you help me understand what happened here?
Michael (staff)
Staff
Posts: 317
Joined: Thu Jul 21, 2016 10:16 am

Sun Aug 01, 2021 3:03 am

Please contact support for a configuration review.
In most cases, the cluster enters into a split-brain state when nodes cannot send heartbeats to each other. Because of that, Microsoft recommends establishing redundant links for a cluster communication.
If you are talking about a possible split-brain on StarWind VSAN level with heartbeat failover strategy, the reason is the same heartbeat and then synchronization link loss. Since we are talking about the virtual environment it could happen during network drivers update, for example, VMware tools update.
Post Reply