I work for a 24-7 call center and have had great success with Starwinds VSAN product. Being able to move VMs between Hyper-V hosts with minimal downtime and then perform updates has been a life saver. A few months ago I had to rebuild a RAID array after hardware failure. When complete I added the new partner to to the VSAN (done with help from this forum - thank you!) and everything has been working well since.
I'm not sure if it related, but this past weekend I applied windows updates to the originally existing Node (Node2) and failed everything over to the rebuilt node (Node3). Everything was going along nicely, Node 2 had been down for about 5 minutes restarting after the update when I watched all the VMs go into a non-running state. The iSCSI initiator showed both Node2 and Node3 to be reconnecting on the currently running server. I restarted Starwind service on Node3 with no result. Node2 came back up after 15 or so minutes but the HAImage that has most of our VMs on it remained down (HAImage2).
./getHASyncState on Node2 and Node3 returned "200 Failed: can't find partner node.."
I couldn't find any network errors each host could ping each other over all the interfaces.
After about 30 minutes the the HAImage2 on Node2 went into a Synchronized state and ./getHASyncState showed Node3 to be synchronizing off node2 (just the opposite of what I would have expected)
I have been able to bring all VMs back up and have recovered from the failure temporarily but need to find the cause. Can someone help point me in the right direction?
Thank you,
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software