2 node vSAN has failed, servers won't connect

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Anatoly (staff), Max (staff)

Post Reply
Benoire
Posts: 25
Joined: Mon Jan 08, 2018 8:13 pm

Fri Apr 13, 2018 11:20 am

Hi

I've got a lab setup which is housing some personal vms including email server, unfortunately I've had a couple of crashes due to extended power cuts while I was out and now both nodes will not connect to each other. I've reset each server, including the services but right now neither will work. My GUI license has expired but I was wondering whether it was possible to get an extension as the powershell commands are not helping me here at all in solving the connection issues.

I've got logs from both nodes which I can send through... Right now however, all the VMs are not accessible and its rather problematic!

Thanks,

Chris
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Apr 13, 2018 12:30 pm

Chris, you need to investigate what node stayed synchronized for a longer period of time (sometimes during power outages it may be a matter of seconds) and mark the proper node as synchronized using one of the scripts from the StarWindX PowerShell samples folder. Here is a custom script based on one of the samples. This script marks all devices on the current node as synchronized. If you need it to be used for a single device, define the device's name (like "HAimage1") by checking the corresponding *_HA.swdsk file or the StarWind config file and amend the script accordingly.

Code: Select all

Import-Module StarWindX

try {
	$server = New-SWServer -host 127.0.0.1 -port 3261 -user root -password starwind
	$server.Connect()
	foreach($device in $server.devices)
	{
		$device.MarkAsSynchronized()       
	}
} catch {
	Write-Host "Exception $($_.Exception.Message)" -foreground red
	$device
}
$server.Disconnect( )
The proper way to proceed in your case would be stopping and disabling the StarWind service on the node, which you think was switched off earlier, use the script on the node remaining online and check the data consistency. If the node contains the recent data (as defined by you after proper check), you simply start the service on the partner node and the service starts synchronization of the partner devices from the current node.
In case the node you defined as the one with the most recent data is a wrong one, you need to stop the StarWind service on it, enable and start the StarWind service on the partner node, use the above script there and check the data consistency. When ready to start the synchronization, simply enable the StarWind service on the other node and start it. Thus, the synchronization process will start from the correct node.
Feel free to let me know if you need assistance with logs investigation.
Post Reply