Hyper-V 2 node cluster consistently restarts full sync

Software-based VM-centric and flash-friendly VM storage + free version
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Dec 15, 2025 2:16 pm

There are vSwitches on your network diagram.
Could you please tell me more about the storage configuration, too? Are there any underlying storage processes running?
Seeing logs from both hosts would be nice.
Electrum
Posts: 24
Joined: Tue Oct 08, 2024 2:22 pm

Mon Dec 15, 2025 2:29 pm

Hi,

Yes, each physical NIC is set to a vSwitch (Sync and iSCSI). The vSwitch is then assigned the IP. vSwitch_01 and vSwitch_02 are used for VM traffic and heartbeats, it isn't directly related to replication or iSCSI.

Here is a Mega link to the latest log files:

Host 01: https://mega.nz/file/DR1R2DJZ#beraNp1v8 ... T6lIjNblRU
Host 02: https://mega.nz/file/Wd0H3JbC#FZpGxEhGV ... IACh1-ykhI
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Dec 15, 2025 3:06 pm

Do you use VSAN as a Windows application? If so, remove the Hyper-V switches (beware of the connectivity drop).
There are storage delays on 02.

12/15 2:48:12.496387 2bc4 IMG: ImageFile_ScsiCompleteRequest: Warning(Time Request EXEC): request(0x0000015925D91220) ssc(0x0000015926AA7240) function(Execute SCSI Command) opCode(0x8A), timeExecRequest = 14750 ms, g_cmdExecTimeWarningLimitInSec = 3 s. Device: 'F:\starwind\datastore03\datastore03.img'
12/15 2:48:12.496414 2bc4 Common: CStarWindStorageDevice::AsyncReadWriteCompleted: Underlying storage request(0x00000159268C8F60, opcode 0x8A) execution time is 14750 ms.

Please reduce replication priority (stop service on the node that undergoes synchronization> edit the header under /headers > set 50 to 5). Please also see what is going on with the storage (hearing more about the configuration will be helpful). Delays like these do not look healthy.
Electrum
Posts: 24
Joined: Tue Oct 08, 2024 2:22 pm

Mon Dec 15, 2025 3:54 pm

Hello,

I've removed the vSwitches and configured the IPs directly on the NICs. I've also lowered the sync priority to 50% as you requested. Datastore03 isn't in use currently, but this isn't something that just affected the disk replicated through that. It is affecting all replicated devices. I'm running a full sync after these changes to see if it makes any difference.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Dec 15, 2025 4:46 pm

Thanks for your update!
Did you stop it on both nodes (please see the causes for full sync https://knowledgebase.starwindsoftware. ... may-start/)?
Please stop the service and set 50 to 5 in the headers.
Electrum
Posts: 24
Joined: Tue Oct 08, 2024 2:22 pm

Mon Dec 15, 2025 4:49 pm

Hello,

I did stop it on both. I was going to try to recreate everything again. I assume as long as I don't delete the .img on the host with the correct data I'll be able to import an existing image after I delete all headers and all config on both nodes?
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Dec 15, 2025 4:53 pm

Full sync and downtime are expected. There was no storage provider at that moment.
Electrum
Posts: 24
Joined: Tue Oct 08, 2024 2:22 pm

Mon Dec 15, 2025 4:55 pm

Hi,

Yes that is fine. The local disk should be available shortly after if I am importing an existing disk, correct?
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Dec 15, 2025 5:03 pm

I don't follow.
Could you please clarify what you mean by 'exporting' in this context?
Electrum
Posts: 24
Joined: Tue Oct 08, 2024 2:22 pm

Mon Dec 15, 2025 5:10 pm

When I add a host in Starwind, I have the option to import an existing device. From there, I can configure the replication to the HA partner. That is what I am referring to. As long as that remains intact, the "data" will remain.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Dec 15, 2025 5:13 pm

There will be a full sync anyway because the bitmap was voided by mutual server stop/initial synchronization. Data can be there, but the service does not "know" about it.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Dec 15, 2025 5:36 pm

Oh, as a side note. 20084 is not that stable when it comes to handling unmap (see https://www.starwindsoftware.com/release-notes-build) if that's applicable to your storage. Please update at some point.
Also, could you please share info about the underlying storage (disk type, RAID, caching)?
Electrum
Posts: 24
Joined: Tue Oct 08, 2024 2:22 pm

Mon Dec 15, 2025 7:14 pm

yaroslav (staff) wrote:
Mon Dec 15, 2025 5:36 pm
Oh, as a side note. 20084 is not that stable when it comes to handling unmap (see https://www.starwindsoftware.com/release-notes-build) if that's applicable to your storage. Please update at some point.
Also, could you please share info about the underlying storage (disk type, RAID, caching)?

The disks are all SSDs running in either RAID 1 or RAID 10. The datastores are mirrored on both servers and they consist of:

Datastore 01:
RAID: RAID10
Read Policy: Read Ahead
Write Policy: Write Back
Strip Size: 64K
Consists of:
--6x Samsung 870 @ 1862.5 Gb each
--1x Hot spare

Datastore 02:
RAID: RAID1
Read Policy: Read Ahead
Write Policy: Write Back
Strip Size: 64K
Consists of:
2x INTEL SSDSC2BB01 @ 1118.25 GB each

Datastore 03:
RAID: RAID1
Read Policy: Read Ahead
Write Policy: Write Back
Strip Size: 64K
Consists of:
2x Samsung 860 @ 931 GB each
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Wed Dec 17, 2025 7:13 pm

Thanks for your update. 870 and 860 are consumer-grade SSDs.
Update to 20086, and reduce replication priority (stop service on the node that undergoes synchronization> edit the header under /headers > set 50 to 5). Please also see what is going on with the storage (hearing more about the configuration will be helpful). From what I read here, it is still 50%. Storage performance is what is killing the replication now.
Electrum
Posts: 24
Joined: Tue Oct 08, 2024 2:22 pm

Thu Dec 18, 2025 12:34 pm

yaroslav (staff) wrote:
Wed Dec 17, 2025 7:13 pm
Thanks for your update. 870 and 860 are consumer-grade SSDs.
Update to 20086, and reduce replication priority (stop service on the node that undergoes synchronization> edit the header under /headers > set 50 to 5). Please also see what is going on with the storage (hearing more about the configuration will be helpful). From what I read here, it is still 50%. Storage performance is what is killing the replication now.
Hello,

Yes, it is still at 50%. I'll reduce it to 5%. To be clear, this is also occurring on the RAID 1 with enterprise drives. What more details in regards to the configuration can I provide?
Post Reply