Node Synchronization after Power failure to both nodes

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
JamesWCA
Posts: 3
Joined: Mon Aug 17, 2020 7:52 pm

Mon Aug 17, 2020 8:59 pm

We had a power event and both nodes where graceful shutdown by the UPS. After coming back up StarWind Management Console reports that all HAImages are Synchronized. We use a PHP script Nagios plugin to keep an eye on our Starwind VSAN, the script is reporting that the Partner node is 100% synchronized but the local node is 0% synchronized. We are using Starwind Free so I only have READ access to the Console.

What can i do make sure both nodes are synchronized so that our monitoring script will report correctly?
Screenshots and script attached.
Starwind.png
Starwind.png (67.82 KiB) Viewed 3235 times

Code: Select all

#!/usr/bin/php -c /etc/php.ini
<?php
error_reporting(0);

if (!isset($argv[1]) || !isset($argv[2]) || !isset($argv[3]) || !isset($argv[4]) || !isset($argv[5]))
{echo "Please, run with parameters: <IPStarwindServer1> <ISCSI port> <Starwind user> <Starwind password> <Starwind HA device name>\nExample: ./check_stardwind_health.php 192.168.3.1 3261 root starwind HAImage1\n"; die;}
else 
{
$Server=$argv[1];
$Port=$argv[2];
$Login=$argv[3];
$Password=$argv[4];
$Device=$argv[5];
}

$cfgTimeOut = 20; //Session timeout

$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
if ($socket === false) {
    echo "CRITICAL: Cannot execute socket_create(). Reason: " . socket_strerror(socket_last_error()) . "\n";
	exit(2);
}

//Try to connect to Starwind iSCSI server:
$result = socket_connect($socket, $Server, $Port);
if ($result === false) {
    echo "CRITICAL: Cannot execute socket_connect() ะบ ".$Server.":".$Port.". Reason: (".$result.") " . socket_strerror(socket_last_error($socket)) . "\n";
	exit(2);
}

$telnet = @fsockopen($Server, $Port, $errno, $errstr, $cfgTimeOut);

fputs ($telnet, "login $Login $Password\r\n");
fputs ($telnet, "list\r\n");
fputs ($telnet, "exit\r\n");

while (!feof($telnet))
{
    $string = fgets($telnet);
    $explode = explode("=", $string);
    if ($explode[0] == "DeviceName")
    {
        $counter++;
    }
    if (isset($explode[1]))
    {
        $values[$counter][$explode[0]] = str_replace("\"", "", $explode[1]);
    }
}

//Debug:
//print_r ($values);

function findDevice($arr,$Device){
    foreach($arr as $sample){
	$sample = preg_replace('~[\r\n]+~', '', $sample);
        if($sample['DeviceName']==$Device){
            return array($sample['DeviceName'],$sample['DeviceId'],$sample['state'],$sample['ha_synch_status'],$sample['ha_synch_percent'],$sample['ha_partner_node1_sync_status'],$sample['ha_partner_node1_is_exist_heartbeat_valid_connection'],$sample['ha_partner_node1_is_exist_sync_valid_connection'],$sample['ha_partner_node1_heartbeat_channels'],$sample['ha_partner_node1_sync_channels'],$sample[ha_partner_node1_sync_percent]);
         }
    }
    return null;
}

if(($result=findDevice($values,$Device))!=null){
	$DevName=$result[0];
	$DevID=$result[1];
	$DevState=$result[2];
	$DevHASyncStatus=$result[3];
	$DevHASyncPerc=$result[4];
	$DevHANode1SyncStatus=$result[5];
	$DevHANode1HeartBeatConn=$result[6];
	$DevHANode1SyncConn=$result[7];
	$DevHANode1HeartBeatChannels=str_replace('$',':',$result[8]);
	$DevHANode1SyncChannels=str_replace('$',':',$result[9]);
	$DevHANode1SyncPerc=$result[10];

	/* print "--- RESPONSE ---\n";
	print "DevName: ".$DevName."\n";
	print "DevID: ".$DevID."\n";
	print "DevState: ".$DevState."\n";
	print "DevHASyncStatus: ".$DevHASyncStatus."\n";
	print "DevHASyncPerc: ".$DevHASyncPerc."\n";
	print "DevHANode1SyncStatus: ".$DevHANode1SyncStatus."\n";
	print "DevHANode1HeartBeatConn: ".$DevHANode1HeartBeatConn."\n";
	print "DevHANode1SyncConn: ".$DevHANode1SyncConn."\n";
	print "DevHANode1HeartBeatChannels: ".$DevHANode1HeartBeatChannels."\n";
	print "DevHANode1SyncChannels: ".$DevHANode1SyncChannels."\n";
	print "--- RESPONSE END ---\n";
	*/

	if($DevState == 0){
	$DevState="OK";
	}elseif($DevState != 0){
	$DevState="BAD";
	}else{
	$DevState=$result[2];
	}

	if($DevHASyncStatus == 1){
	$DevHASyncStatus="Synchronized";
	}elseif($DevHASyncStatus == 2){
	$DevHASyncStatus="Synchronizing";
	}else{
	$DevHASyncStatus=$result[3];
	}

	if($DevHANode1HeartBeatConn == 1){
	$DevHANode1HeartBeatConn = "established";
	}elseif($DevHASyncStatus != 1){
	$DevHANode1HeartBeatConn = "disconnected";
	}else{
	$DevHANode1HeartBeatConn = $result[6];
	}

	if($DevHANode1SyncConn == 1){
	$DevHANode1SyncConn = "established";
	}elseif($DevHANode1SyncConn != 1){
	$DevHANode1SyncConn = "disconnected";
	}else{
	$DevHANode1SyncConn = $result[6];
	}

	if($DevHANode1SyncStatus == 1){
	$DevHANode1SyncStatus = "synchronized";
	}elseif($DevHANode1SyncStatus == 2){
	$DevHANode1SyncStatus = "synchronizing";
	}elseif($DevHANode1SyncStatus == 0){
	$DevHANode1SyncStatus = "DOWN";
	}elseif($DevHANode1SyncStatus == 3){
	$DevHANode1SyncStatus = "NOT synchronized";
	}else{
	$DevHANode1SyncStatus = $result[6];
	}

	//Everything is OK:
	if(($DevState == "OK") AND ($DevHASyncStatus == "Synchronized") AND ($DevHASyncPerc == 100) AND ($DevHANode1SyncStatus == "synchronized") AND ($DevHANode1HeartBeatConn == "established") AND ($DevHANode1SyncConn == "established")){
		echo "OK: VSAN Volume ".$DevName. " health is OK. ".$DevHASyncStatus." ".$DevHASyncPerc."%. Details:\nPartner is ".$DevHANode1SyncStatus." (".$DevHANode1SyncPerc."%).\nHeartbeat Channels ".$DevHANode1HeartBeatChannels." is ".$DevHANode1HeartBeatConn.".\nSync Channels ".$DevHANode1SyncChannels." is ".$DevHANode1SyncConn.".\n";
		exit(0);
	//Condition for primay node, when secondary node syncing.
	}elseif(($DevState == "OK") AND ($DevHASyncStatus == "Synchronized") AND ($DevHASyncPerc == 100) AND ($DevHANode1SyncStatus == "synchronized" OR $DevHANode1SyncStatus == "synchronizing") AND ($DevHANode1HeartBeatConn == "established") AND ($DevHANode1SyncConn == "established")){
		echo "OK: VSAN Volume ".$DevName. " health is OK. ".$DevHASyncStatus." ".$DevHASyncPerc."%. Details:\nPartner is ".$DevHANode1SyncStatus." (".$DevHANode1SyncPerc."%).\nHeartbeat Channels ".$DevHANode1HeartBeatChannels." is ".$DevHANode1HeartBeatConn.".\nSync Channels ".$DevHANode1SyncChannels." is ".$DevHANode1SyncConn.".\n";
		exit(0); 
	//Condition for secondary node in syncing state:
	}elseif(($DevState == "OK") AND ($DevHASyncStatus == "Synchronizing") AND ($DevHASyncPerc != 100) AND ($DevHANode1SyncStatus == "synchronized" OR $DevHANode1SyncStatus == "synchronizing") AND ($DevHANode1HeartBeatConn == "established") AND ($DevHANode1SyncConn == "established")){
		echo "WARNING: VSAN Volume ".$DevName. " health is WARNING. ".$DevHASyncStatus." ".$DevHASyncPerc."%. Details:\nPartner is ".$DevHANode1SyncStatus." (".$DevHANode1SyncPerc."%).\nHeartbeat Channels ".$DevHANode1HeartBeatChannels." is ".$DevHANode1HeartBeatConn.".\nSync Channels ".$DevHANode1SyncChannels." is ".$DevHANode1SyncConn.".\n";
		exit(1); 
	//Condition, when partner is unreacheble, but VSAN Image is working:
	}elseif(($DevState == "OK") AND ($DevHASyncStatus == "Synchronized") AND ($DevHASyncPerc == 100) AND ($DevHANode1SyncStatus != "synchronized" OR $DevHANode1SyncStatus != "synchronizing") AND ($DevHANode1HeartBeatConn != "established") AND ($DevHANode1SyncConn != "established")){
		echo "WARNING: VSAN Volume ".$DevName. " health is WARNING. ".$DevHASyncStatus." ".$DevHASyncPerc."%. Details:\nPartner is ".$DevHANode1SyncStatus." (".$DevHANode1SyncPerc."%).\nHeartbeat Channels ".$DevHANode1HeartBeatChannels." is ".$DevHANode1HeartBeatConn.".\nSync Channels ".$DevHANode1SyncChannels." is ".$DevHANode1SyncConn.".\n";
		exit(1); 
	//All other cases is critical:
	}else{
		echo "CRITICAL: VSAN Volume ".$DevName. " health is CRITICAL. ".$DevHASyncStatus." ".$DevHASyncPerc."%. Details:\nPartner is ".$DevHANode1SyncStatus." (".$DevHANode1SyncPerc."%).\nHeartbeat Channels ".$DevHANode1HeartBeatChannels." is ".$DevHANode1HeartBeatConn.".\nSync Channels ".$DevHANode1SyncChannels." is ".$DevHANode1SyncConn.".\n";
		exit(2); 
	}	

}else{
 echo "UNKNOWN: Device ".$Device." is not found! Check login and password or check HA Device name.\n";
 exit(3);
}

?>
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Mon Aug 17, 2020 9:29 pm

Hello James and welcome to StarWind Forum!
Please note that StarWind Support does not develop the scripts on purpose. I can still advise you how to check the synchronization state by means of StarWindX. Run SyncHaDevice.ps1 from here C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell. The outputs will show you if HA devices are synchronized. Run the script for each HA device.

Let me know if any assistance is required.
JamesWCA
Posts: 3
Joined: Mon Aug 17, 2020 7:52 pm

Mon Aug 17, 2020 10:33 pm

SyncHADevice.ps1 and SyncHaDeviceAdvanced.ps1 both report all devices synchronized on both nodes. Any idea why the monitoring script might be reading something different?
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Tue Aug 18, 2020 3:30 am

If the StarWindX scripts report that they are all synchronized, there is nothing to worry about. Could you tell me when the script is triggered?
Try restarting the StarWind Service on each node. The procedure looks pretty much like one described here https://knowledgebase.starwindsoftware. ... installed/.
If the issue persists, let me know, please.

1. Check that all StarWind HA devices have the “Synchronized” status on all servers;
2. Check that all CSVs have active paths from all StarWind servers (go to iSCSI initiator);
3. Move all Cluster resources (VMs and roles) from the server, which is going to be restarted;
4. Restart the Service on one server;
5. Wait until the StarWind VSAN service starts and the synchronization process finishes;
6. Check that all StarWind HA devices have the “Synchronized” status on all servers using the StarWindX script;
7. Check that all CSVs have active paths from all StarWind servers (go to iSCSI Initiator);
8. Repeat the above steps for any other server
JamesWCA
Posts: 3
Joined: Mon Aug 17, 2020 7:52 pm

Thu Sep 03, 2020 12:06 am

Moving all VMs and restarting the Starwind VSAN service cleared the error with the monitoring script. Thanks for the help.
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Thu Sep 03, 2020 12:18 pm

Awesome, glad to how that it worked at the end :)
Do not hesitate to contact us if you have additional questions.
Post Reply