Synched but not synched, how to resolve?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Sun Oct 11, 2020 9:40 pm

I have a rather odd synch issue that I don't seem to be able to find any information on how to resolve.

This a two node hyper-converged cluster that I abused by moving ethernet cables between two connected ethernet switches while it was running. It went into synchronization. When the dust cleared, and it finished resynchronizing, the cluster was down. Here's what I can't figure out how to resolve.

One of the servers says all devices are active (state 0), synchronized (status 1) with synch percent 100%.
The other server is the same (i.e., synchronized), except that synch percent is zero (0%).

What's weird is that the CSV cluster shared volumes appear to be synchronized, and synch is active. I can put test files on one server and they show up on the other immediately. A quick check shows the same number of bytes used on both servers. But my Windows Failover Cluster Manager is having none of it. Not happy at all.

I've checked my iSCSI targets, all looks good there, too -- connected both ways.

I've resisted the temptation to try to force synchronized status -- which doesn't really make sense to do since they both do claimed to be synchronized on the basis of status, if not percent.

I'm attaching a copy of the latest version of my customized status script and its output. If you like the script, feel free to use it: it makes several things easier.

Like I say, I can't find any discussion or documentation that covers this situation. Help!

Here's the output from both servers:

Code: Select all

Server HV3
------------- Running ---------------

DeviceName: HAImage1 Sync Status: 1 -- TargetName: iqn.2008-08.com.starwindsoftware:kmhv3.kmsi.net-witness
Device state: 0 (Active)
Synchronized: 100%
ha_partner_nodes_count: 1
ha_partner_node1_sync_status: 1
ha_partner_node1_sync_percent: 0
ha_partner_node1_is_exist_sync_valid_connection: 1
ha_partner_node1_is_exist_heartbeat_valid_connection: 1

DeviceName: HAImage2 Sync Status: 1 -- TargetName: iqn.2008-08.com.starwindsoftware:kmhv3-csv1
Device state: 0 (Active)
Synchronized: 100%
ha_partner_nodes_count: 1
ha_partner_node1_sync_status: 1
ha_partner_node1_sync_percent: 0
ha_partner_node1_is_exist_sync_valid_connection: 1
ha_partner_node1_is_exist_heartbeat_valid_connection: 1

Server HV4
------------- Running ---------------

DeviceName: HAImage1 Sync Status: 1 -- TargetName: iqn.2008-08.com.starwindsoftware:kmhv4-witness
Device state: 0 (Active)
Synchronized: 0%
ha_partner_nodes_count: 1
ha_partner_node1_sync_status: 1
ha_partner_node1_sync_percent: 100
ha_partner_node1_is_exist_sync_valid_connection: 1
ha_partner_node1_is_exist_heartbeat_valid_connection: 1

DeviceName: HAImage2 Sync Status: 1 -- TargetName: iqn.2008-08.com.starwindsoftware:kmhv4.kmsi.net-csv1
Device state: 0 (Active)
Synchronized: 0%
ha_partner_nodes_count: 1
ha_partner_node1_sync_status: 1
ha_partner_node1_sync_percent: 100
ha_partner_node1_is_exist_sync_valid_connection: 1
ha_partner_node1_is_exist_heartbeat_valid_connection: 1
Here's the script, if you're interested:

Code: Select all

#
# This following example shows how to get synchronization status of specified HA device and 
# if there is a need run synchronization
#
Import-Module StarWindX

while ($true) {

"------------- Running ---------------"

try
{
    #
    # connect to the server
    #
    $server = New-SWServer -host 127.0.0.1 -port 3261 -user root -password starwind

    $server.Connect()

    #
    # Try to find specified device
    #
    $deviceName = "*"
    $partnerTargetName = "*"
    $deviceFound = $false
    
    foreach($device in $server.devices)
    {
            Write-Host ""
            Write-Host -NoNewline "DeviceName:" $device.GetPropertyValue("DeviceName") -foreground yellow
                        
            $device.Refresh()
            
            $syncState = $device.GetPropertyValue("ha_synch_status")
            
            Write-Host -NoNewline "Sync State: " $syncState
            
            If ( $syncState -ne "" )
            {
                Write-Host " -- TargetName:" $device.GetPropertyValue("TargetName") -foreground yellow

                $state = $device.GetPropertyValue("state")
                Write-Host -NoNewline "Device state:" $device.GetPropertyValue("state") 
             
                switch ($state)
                {
                    0 { " (Active)" }
                    1 { " (NonActive)" }
                    2 { " (NotLicensed)" }
                    3 { " (Disabled)" }
                    default { "(Undefined)" }
                }

               $waitForAutoSync = $device.GetPropertyValue("ha_wait_on_autosynch")
                
                if ( $waitForAutoSync -eq "1" )
                {
                    Write-Host "Waiting for autosynchronization..." -foreground yellow
                }
                else
                {
                    if ( $syncState -eq "1" )
                    {
                        #
                        # Device is synchronized. Get synchronization percent and show it
                        #
                        $syncPercent = $device.GetPropertyValue("ha_synch_percent")
                        
                        Write-Host "Synchronized: $($syncPercent)%" -foreground yellow
                    }
                    if ( $syncState -eq "2" )
                    {
                        #
                        # Device is synchronizing. Get synchronization percent and show it
                        #
                        $syncPercent = $device.GetPropertyValue("ha_synch_percent")
                        
                        Write-Host "Synchronizing: $($syncPercent)%" -foreground yellow
                    }
                    
                    if ( $syncState -eq "3" )
                    {
                        #
                        # Device not synchronized. Synchronize current node from partner
                        #
                        Write-Host "Device not synchronized."
                        
                        # Synchronize current node from partner '$($partnerTargetName)'" -foreground yellow

                       # $params = new-object -ComObject StarWindX.Parameters        
                       # $params.AppendParam("deviceID",$device.DeviceId)
                       # $params.AppendParam("partnetTargetName",$partnerTargetName)
                        
                       # $server.ExecuteCommand( 0, "restoreHAPartnerNode", $params)
                        
                        #
                        # If you want to synchronize partners from current node you can comment out code above and uncomment section below
                        #
                        
                        # Device not synchronized. Mark current node as 'Synchronized'. 
                        # WARNING, Command changes Device Status to "Synchronized" without Data Synchronization with HA (High Availability) Partner, 
                        # Device will start processing Client Requests immediately and will be used as Data Synchronization Source for Partner Device.
                        #
                        #Write-Host "Device not synchronized. Mark current node as 'Synchronized'. " -foreground yellow

                        #$params = new-object -ComObject StarWindX.Parameters        
                        #$params.AppendParam("deviceID",$device.DeviceId)
                        
                        #$server.ExecuteCommand( 0, "restoreCurrentHANode", $params)
                        
                       # Start-Sleep -m 5000
                    }
                }
#=======
                Write-Host "ha_partner_nodes_count:" $device.GetPropertyValue("ha_partner_nodes_count")
                Write-Host "ha_partner_node1_sync_status:" $device.GetPropertyValue("ha_partner_node1_sync_status")
                Write-Host "ha_partner_node1_sync_percent:" $device.GetPropertyValue("ha_partner_node1_sync_percent")
                Write-Host "ha_partner_node1_is_exist_sync_valid_connection:" $device.GetPropertyValue("ha_partner_node1_is_exist_sync_valid_connection")
                Write-Host "ha_partner_node1_is_exist_heartbeat_valid_connection:" $device.GetPropertyValue("ha_partner_node1_is_exist_heartbeat_valid_connection")
                
               # Write-Host "ha_partner_node2_sync_status:" $device.GetPropertyValue("ha_partner_node2_sync_status")
               # Write-Host "ha_partner_node2_sync_percent:" $device.GetPropertyValue("ha_partner_node2_sync_status")

            } 
            else
            {
                write-host -NoNewLine " -- no info"
            }
    }
    
#    if ( $deviceFound -ne $true )
#    {
#        Write-Host "$($deviceName) not found" -foreground red
#    }

}
catch
{
    Write-Host "Exception $($_.Exception.Message)" -foreground red 
}

Write-Host ""

$server.Disconnect( )
pause

}
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 12, 2020 3:23 am

Greetings,

Thanks for your question. So, your concern is that the script returns that the device is "not synchronized" even though you are sure the devices synchronized on both ends? Please, restart the service on the "not synchronized" side.
Could you tell me the StarWind version you use? Just open the latest log file, the version should be written somewhere at the top of the file.
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Mon Oct 12, 2020 5:14 pm

Thanks for getting back to me, Yaroslav!

Well, I wasn't totally sure if they were fully synchronized or not, when sync status is 1 -- which apparently means Synchronized -- but the percentage is zero. It _acts_ synchronized, and there is no resynch going on. The other server looks 100% normal. So, I was hoping you could tell me what to trust.

Anyway, I followed your advise and stopped and restarted the Starwind Virtual SAN service on the server with the anomalous readings. The service came up quickly without any resynch, and the numbers now look right. Thanks!

I'm running StarWind Virtual SAN v8.0.0 (Build 11456, [SwSAN], Win64) over Windows Server 2016.

--- kenw
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 12, 2020 6:20 pm

Hey,
I'm running StarWind Virtual SAN v8.0.0 (Build 11456, [SwSAN], Win64) over Windows Server 2016.
That's an old one, please consider updating to the latest build. Here is the procedure https://knowledgebase.starwindsoftware. ... d-version/.

Could you share the logs from both servers here with me? Please use Google Disk or OneDrive. Please use our tool (https://knowledgebase.starwindsoftware. ... collector/) to collect the logs.
I have seen this before, but I need the logs to say if the service can be restarted.
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Mon Oct 12, 2020 6:31 pm

By the way, a nice thing about the script I provided is that it should not require any editing AT ALL to run on most StarWind VSAN servers, and provides quite detailed information. I think many people might find that useful.
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Mon Oct 12, 2020 7:05 pm

Thanks, Yaroslav! I've sent you a PM with the OneDrive links to the log files.
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 12, 2020 8:48 pm

Greetings,

All HA devices are synchronized on both sides. Just go ahead and restart service on node2. Once restarted, try running the script.
Let me know if any assistance is required.
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Mon Oct 12, 2020 9:50 pm

Yes, by the time's collected those logs, I'd already done that as per your recommendation on Sun Oct 11, 2020 8:23 pm, which I reported the next morning.

Anyway, it worked well, as you saw. I'm looking at doing an update now, as up also recommended.

Thanks for your help, Yaroslav!
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 12, 2020 10:18 pm

You are always welcome. Let us know if any assistance is required.
Post Reply