Greetings.
The script does regular tries to acquire the synchronization status and displays it to the user.
I believe that I have solved the riddle of synchronization status. The output something to do with the HA device. If you look carefully at the outputs you have, one says 100% synchronized for local while another says 0% synchronized for another local device. Device HA priority in your setup is mixed.
Here is how you can change a StarWind HA device priority.
1) Make sure that you shut down StarWind VSAN services. Just go to Task Manager and disable everything related to StarWind.
2) Go to the device folder and open the corresponding *_HA.swdsk as a text file (say, with Notepad++, or Notepad). Your device name is vsan-02.
Find the section looking like that
<node id="2" name="iqn.2008-08.com.starwindsoftware:10.255.255.252-vsan-02" shut="false" active="true">
<storages>
<storage_ref id="2"/>
</storages>
<parameters>
<type>1</type>
<priority>1</priority> - OWNER NODE PARAMETER
<sync_status>1</sync_status>
</parameters>
</node>
<node id="3" name=" iqn.2008-08.com.starwindsoftware:<node name>-vsan-01" shut="false" active="true">
<storages>
<storage_ref id="3"/>
</storages>
<parameters>
<type>1</type>
<priority>0</priority> - PARTNER NODE PARAMETER
<sync_status>1</sync_status>
</parameters>
</node>
Change the <priority>1</priority> to <priority>0</priority> (i.e., the owner node parameter) and <priority>0</priority> to <priority>1</priority> (i.e., the partner node parameter). Save the file. Enable all the StarWind services which you have disabled during the preparation phase.
Now, go to the vsan-01 host (I think the partner node is called like that).
1) Make sure that you shut down StarWind VSAN services. Just go to the Task Manager and disable all services related to StarWind.
2) Go to the device folder and open the corresponding *_HA.swdsk as a text file (say, with Notepad++ or Notepad). Your device name is vsan-01.
Find the section which looks like that
<node id="2" name=" iqn.2008-08.com.starwindsoftware:<node name>-vsan-01" shut="false" active="true">
<storages>
<storage_ref id="2"/>
</storages>
<parameters>
<type>1</type>
<priority>0</priority> - OWNER NODE PARAMETER
<sync_status>1</sync_status>
</parameters>
</node>
<node id="3" name=" iqn.2008-08.com.starwindsoftware:10.255.255.252-vsan-02" shut="false" active="true">
<storages>
<storage_ref id="3"/>
</storages>
<parameters>
<type>1</type>
<priority>1</priority> - PARTNER NODE PARAMETER
<sync_status>1</sync_status>
</parameters>
</node>
Change the <priority>0</priority> to <priority>1</priority> (i.e., the owner node parameter) and <priority>1</priority> to <priority>0</priority> (i.e., the partner node parameter). Save the file. Enable all the StarWind services and go to SW Management console to verify that the changes were successfully applied.
From <
https://starwindhelp.zendesk.com/agent/tickets/204924>
PLEASE DO THAT ONLY FOR THE WITNESS DEVICE!!! Once you do the change, the active primary (KMHV3) will report Synchronized 100% for all devices, the secondary (KMHV4) will report 0% synchronized as it is not the primary
Also did a config review of your headers.
There is a misconfig: 10.4.34.x is used for both iSCSI and synchronization. Please remove it from synchronization interfaces.
Could you also tell me what all the connections are for?
10.4.35.x, 10.4.34.x., and 10.4.36.x are for synchronization, while 10.4.34.x is also for iSCSI we recommend using dedicated channels for each type of traffic.
Regarding the cluster failures, there are chkdsks are running
26212 kmHV4.kmsi.net 110020 Information "Chkdsk was executed in read-only mode on a volume snapshot.
Checking file system on E:
The type of the file system is NTFS.
Volume label is Local Data HV4.
WARNING! /F parameter not specified.
Running CHKDSK in read-only mode.
Stage 1: Examining basic file system structure ...
1536 file records processed.
File verification completed.
121 large file records processed.
0 bad file records processed.
Stage 2: Examining file name linkage ...
1582 index entries processed.
Index verification completed.
0 unindexed files scanned.
0 unindexed files recovered to lost and found.
Stage 3: Examining security descriptors ...
Security descriptor verification completed.
23 data files processed.
Windows has scanned the file system and found no problems.
No further action is required.
792439807 KB total disk space.
657198312 KB in 531 files.
268 KB in 25 indexes.
0 KB in bad sectors.
91675 KB in use by the system.
65536 KB occupied by the log file.
135149552 KB available on disk.
4096 bytes in each allocation unit.
198109951 total allocation units on disk.
33787388 allocation units available on disk.
" Chkdsk
and
1792 kmHV3.kmsi.net 1157178 Error "Cluster physical disk resource failed periodic health check.
Physical Disk resource name: Cluster Disk 2
Device Number: 4
Device Guid: {d5d6f8df-7081-8b74-e91e-118eb289f818}
Error Code: 1167
Additional reason: ClusDiskReportedFailure
If the reason is ReattachTimeout, it means attaching a new RHS process to the disk resource took too long.
If the reason is ClusDiskReportedFailure, it means the underlying disk device was removed from the system.
If the reason is QuorumResourceFailure, it means this is a Spaces quorum resource.
If the reason is VolumeNotHealthy, it means one of the volumes is not healthy and may need repair." Microsoft-Windows-FailoverClustering 10/19/2020 13:32
chkdsk brings the volume offline, so be careful with that one. here is how to run checks
https://www.starwindsoftware.com/blog/h ... rwind-vsan.
What I would recommend is plan downtime and chkdsk on underlying storage and HA devices.