Page 1 of 2
Some HA Disks Not Able to Resync After Reboot
Posted: Fri Nov 26, 2021 4:27 am
by MeCJay12
Hello! I have a 2 node hyper-converged Hyper-V cluster running with a vSAN backend. After a recent reboot, 2 of my four HA disks aren't able to resync. They will start and fail within 10 minutes. The other two disks were able to resync without an issue. The two functioning disks are for Hyper-V's witness disk ("Witness") and a cluster shared volume for a cluster file server ("Data") and the two failing disks are a file share used on a different machine ("Games") and a disk for the cluster VMs ("VMs"). What can I look into to get these disks resynced? My two servers have each has 2x10G links directly to each other for sync and 2x10G links to the network for heartbeat, Internet, etc.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Fri Nov 26, 2021 5:03 am
by yaroslav (staff)
Hi that is called "fast" synchronization and is expected to happen on StarWind VSAN host restart. Fast sync is the process that synchronizes the data to the "offline" server from the "active" one. It is necessary to ensure data is the same on both nodes.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Sun Nov 28, 2021 4:26 am
by MeCJay12
Hey yaroslav, thanks for the response. Both of the devices that are failing are trying to do a full sync. They estimate that it will take ~90 minutes but fail after 5-10 minutes. They sit idle for a while then try again. My other disks were able to do a fast sync and recover as you described.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Mon Nov 29, 2021 8:41 am
by yaroslav (staff)
Hi,
Please share the logs with me. Collect the logs from both servers as described here
https://knowledgebase.starwindsoftware. ... collector/.
Share them via Google Disk, OneDrive, Sharepoint, etc.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Mon Nov 29, 2021 3:06 pm
by MeCJay12
Logs uploaded to Google Drive. Link DM'd to you.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Fri Dec 03, 2021 3:22 am
by MeCJay12
Bump
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Fri Dec 03, 2021 7:41 am
by yaroslav (staff)
Hi,
Thanks for your patience, still need slightly more time to check the logs.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Fri Dec 03, 2021 10:57 am
by yaroslav (staff)
Both hosts have incorrect Discovery settings for local targets. See more here
https://www.starwindsoftware.com/resour ... rver-2016/ under Provisioning StarWind HA Storage to Windows Server Hosts.
Old build, please update.
I also see huge delays for iqn.2008-08.com.starwindsoftware:hvlabb.ad.cshaheen.tech-games4 on B, the synchronized node. You may also need to restart the service on B. Please take a backup of VMs on the affected CSV -> stop the VMs using the affected CSV -> restart the service on B -> observe synchronization.
If B ends-up getting out of synchronization too (mutual not synchronized state on A and B for the affected device), on B, try C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell -> run SyncHaDevice.ps1 for HAImage5.
Please also note that your system has an unsupported network configuration: no dedicated physical Synchronizaton link and mixing of iSCSI and Management traffic. This being said, I also doubt if this deployment meets our best practices
https://www.starwindsoftware.com/best-p ... practices/.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Sat Dec 04, 2021 1:43 am
by MeCJay12
I've updated the Discovery settings.
Updated the build. Now I'm getting an error about console and service being different versions.
I tried rebooting B and that didn't help the sync.
I ran the script on B. The output was `Device HAImage5 is synchronized` though it was already synchronized.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Sun Dec 05, 2021 9:14 am
by yaroslav (staff)
Now I'm getting an error about console and service being different versions.
This is not an error, this is a warning which can be ignored in most cases. Furthermore, you can update the console as well to make it go away.
Full synchronization must start on service restart on B. Did it finish?
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Mon Dec 06, 2021 4:20 am
by MeCJay12
No, synchronization continues to fail.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Mon Dec 06, 2021 4:46 am
by yaroslav (staff)
I see huge delays for that volume. Did you restart the service on the active side the way I suggested?
Also, did StarWind VSAN change the synchronization type to Full synchronization?
I see huge delays coming from the underlying storage. Could you kindly tell me what the underlying storage configuration is, please?
Finally, can I have the updated logs, please?
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Mon Dec 06, 2021 5:28 pm
by MeCJay12
I followed your steps except rather than restarting the service, I rebooted the whole node. I can try again if you note which services exactly to restart.
The synchronization has always been Full (as far as I know).
The underlying storage is 5 x Samsung SSD 860 EVO 4TB drives in a RAID 0 using a Dell H710 mini.
Fresh logs added to the same Google Drive link as before.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Mon Dec 06, 2021 7:56 pm
by yaroslav (staff)
I will check the logs in the good order.
Also, we do not recommend RAID0. Please see the recommended settings at
https://knowledgebase.starwindsoftware. ... ssd-disks/
Full synchronization is expected because the actove node was restarted as a part of troubleshooting.
I'd like to draw your attention to the fact that the network configuration of your setup does not meet StarWind VSAN system requirements
https://www.starwindsoftware.com/system-requirements.
Will keep you posted on log investigation.
Re: Some HA Disks Not Able to Resync After Reboot
Posted: Fri Dec 10, 2021 9:16 am
by yaroslav (staff)
I can see this event on B every time the synchronization drops
12/6 12:09:12.567314 1e3c IMG: *** ImageFile_IoCompleted: Disk operation failed. Disk path: D:\VMs4\VMs4.img. Error code: (1).
Please try recreating the replica by using RemoveHAPartner and AddHAPartner. Run these scripts from the healthy node. Make sure to have Device priorities the same for all devices on one node. See more on priorities here
https://forums.starwindsoftware.com/vie ... f=5&t=5731.
If you have no data on that HA device, try recreating it from scratch.
Alternatively, you can create a new IMG, migrate the data there and replicate it. Please note that the iSCSI target goes unavailable for a brief moment during converting .img to the HA device.
Also, note that you are running an unsupported network configuration for this system.