VSAN Free not surviving power failures

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: art (staff), anton (staff), Anatoly (staff), Max (staff)

Post Reply
travsys
Posts: 2
Joined: Fri Dec 03, 2021 10:18 am

Fri Dec 03, 2021 10:38 am

I have setup a 2 node Hyper-V cluster using VSAN Free by following the instructions from this page:
https://www.starwindsoftware.com/resour ... rver-2016/

Using 2 DL360-G9 servers with 128 GB ram and 4 SAS 450GB disks plus 4 x1GB network and 2x 10GB network.
Setup took some time as I was using the free version and had to use the powershell scripts.
In the end it worked fine.

Now the problem is the cluster does not survive power failures. After pulling the plug on all servers at once, the hyper-v cluster won't come up. Looking at the VSAN console it shows a nodes as not synchronized and it won't start synchronizing.
Only way to recover is to run a script that marks the devices on one of the nodes as synchronized, only the synchronization starts.

Is there a way to make the system more resilient to power failures? (dont tell me to use a UPS). Will using 3 nodes be better?

And can we make recovery automatic?

Lex
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Fri Dec 03, 2021 12:02 pm

Disable write-back cache as described here https://knowledgebase.starwindsoftware. ... -l1-cache/.
Stick with this guide https://knowledgebase.starwindsoftware. ... -blackout/ to recover from these situations.
travsys
Posts: 2
Joined: Fri Dec 03, 2021 10:18 am

Mon Dec 06, 2021 2:27 pm

Thanks, disabling cache made it survice power failures, at least is has not failed sofar.

I do have some issues, when I pull the plug on just one node, that node comes in an isolated state and the VM's running on that node become suspended.
And other issue, after power failure on both nodes, it does a full sync when nodes come online.

Any suggestion how to improve this?
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Mon Dec 06, 2021 3:13 pm

Hi,

These are expected behaviors.
Failover cluster does not mark the VMs failed and does not immediately fail over to the partner. It is expected behavior as the cluster cannot distinguish a failed node from a disconnected one immediately. See more at https://techcommunity.microsoft.com/t5/ ... a-p/372027. This article has some practical advice on how to shrink the failover threshold.
And other issue, after power failure on both nodes, it does a full sync when nodes come online.
Again, this is not an issue, it is expected behavior. See more at https://knowledgebase.starwindsoftware. ... may-start/. We are working on the mechanisms which could eliminate full synchronization after the power outage. Please stay tuned.
Post Reply