vmfs corruption

Software-based VM-centric and flash-friendly VM storage + free version
Post Reply
aeon
Posts: 12
Joined: Tue Sep 17, 2024 7:57 am

Fri Jul 18, 2025 12:10 pm

Hello. In a 2-node StarWind vSAN(cvm) setup, there are two 10Gb replication and data cables directly connected between the nodes without a switch. The management ports are connected via a switch.

For testing, I’m running a single virtual machine on the vSAN datastore. To simulate network loss, I unplug both 10Gb cables and the management port of Node 1. The VM tries to failover to the other node, but most of the time, either immediately or when I reconnect the network cables, the VMFS becomes corrupt.

I wonder what I am doing wrong?

edit: i think this is split brain scenario. please dont say place a quorum node on a different dc. i need a two node solution.
Last edited by aeon on Fri Jul 18, 2025 1:15 pm, edited 1 time in total.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Fri Jul 18, 2025 1:13 pm

What you describe looks like a spit-brain scenario. Check the logs for any "partner not synchronized" events without the node getting not synchronized itself.
Do you disconnect all networks?
aeon
Posts: 12
Joined: Tue Sep 17, 2024 7:57 am

Fri Jul 18, 2025 1:23 pm

yes i all cables except second nodes vm networks.
aeon
Posts: 12
Joined: Tue Sep 17, 2024 7:57 am

Fri Jul 18, 2025 1:28 pm

I'm already going to add a quorum virtual machine from another system into this environment. In this case, it doesn't really make sense to use a separate quorum for this system as well.

Is it somehow possible to continue from the second node (without a quorum) in a 2-node setup if there's a network loss scenario?
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Fri Jul 18, 2025 1:46 pm

That was a split brain then https://www.starwindsoftware.com/blog/w ... -avoid-it/.
We design the systems in a way that there is no complete simultaneous network interruption to avoid split brain, OR suggest using the witness node.
Is it somehow possible to continue from the second node (without a quorum) in a 2-node setup if there's a network loss scenario?
Theoretically, the "connected" node will form the majority with the witness and remain up, while the "disconnected" node has its HA storage going not synchronized and not accessible over iSCSI.
Post Reply