vSAN Free @ Proxmox - The replication partner is not synchronized

Software-based VM-centric and flash-friendly VM storage + free version
dwma
Posts: 12
Joined: Thu Aug 21, 2025 12:57 pm

Wed Aug 27, 2025 8:16 am

Hi,
I've been doing some tests before I'll go fully into production, restarted all the proxmox nodes (3 node cluster), and the vSAN failed to synchronize. Currently only one vSAN appliance is working OK, but in this way I'm not having any HA.

It throws below error all the time for 2 vSAN appliances:
The replication partner “vsan03” is not synchronized
One of the replication partners is not synchronized after several unsuccessful synchronization attempts. Try to decrease the workload, increase the synchronization priority, and perform synchronization again. If you reencounter this alert, please file the support request.

I've already changed priority to "faster synchronization" and run manual sync, but after few minutes it'll gave the same error and the sync is stopped.
Replication network is a meshed network (each server connected to each other) via 25GBe Mellanox ConnectX-4 cards and the only replication is set through this cards.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Wed Aug 27, 2025 9:02 am

Hi,

Try pinging the replication links. Also, try VMXNET3 drivers.
Reduce priority to slower synchronization
dwma
Posts: 12
Joined: Thu Aug 21, 2025 12:57 pm

Wed Aug 27, 2025 10:13 am

I've changed the drivers to VMXNET3 on all vSAN VMs - no luck.

My network looks like this:
pve1 vmbr25a (x.x.100.1) -> pve2 vmbr25a (x.x.100.2)
pve1 vmbr25b (x.x.101.1) -> pve3 vmbr25a (x.x.101.2)

pve2 vmbr25a (x.x.100.2) -> pve1 vmbr25a (x.x.100.1)
pve2 vmbr25b (x.x.102.1) -> pve3 vmbr25b (x.x.102.2)

pve3 vmbr25a (x.x.101.2) -> pve1 vmbr25b (x.x.101.1)
pve3 vmbr25b (x.x.102.2) -> pve2 vmbr25b (x.x.102.1)

All IPs on replication network have /30 mask (255.255.255.252) - point to point.

I can ping all interfaces for eg. pve1 -> pve2 vmbr25a and pve3 vmbr25a interfaces, and similar to the pve2 and 3 hosts according to the network "diagram".
Also changed replication priority to default balaced - it works for a few minutes, then replication stops.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Wed Aug 27, 2025 10:44 am

Please pull the logs from all nodes (Log in to the Web management interface -> Select the gear-shaped button -> Expand the VM name view -> Support Bundle). Upload them here https://www.starwindsoftware.com/support-form using 1364582 as your reference.
dwma
Posts: 12
Joined: Thu Aug 21, 2025 12:57 pm

Wed Aug 27, 2025 11:04 am

yaroslav (staff) wrote:
Wed Aug 27, 2025 10:44 am
Please pull the logs from all nodes (Log in to the Web management interface -> Select the gear-shaped button -> Expand the VM name view -> Support Bundle). Upload them here https://www.starwindsoftware.com/support-form using 1364582 as your reference.
Done.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Wed Aug 27, 2025 1:47 pm

Please reduce synchronization priority to 5% .
I can see that the replication is stopping due to underlying storage delays. I think it might be related to the ZFS volume you use.
Could you please tell me more about the storage configuration?
dwma
Posts: 12
Joined: Thu Aug 21, 2025 12:57 pm

Thu Aug 28, 2025 12:33 pm

yaroslav (staff) wrote:
Wed Aug 27, 2025 1:47 pm
Please reduce synchronization priority to 5% .
I can see that the replication is stopping due to underlying storage delays. I think it might be related to the ZFS volume you use.
Could you please tell me more about the storage configuration?
Where I can configure sync priority by percentage? I don't see that option - I have only options like: Balanced (default), Faster synchronization, Faster client request processing.

Each node has 6x 10K SAS HDDs 600GB (this is for vSAN) + 2x same disks for OS (Proxmox - ZFS mirroring). It's running under P440ar controller in HBA mode.
These 6x vSAN disks are passed through (direct-passthrough) to the vSAN VM.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Thu Aug 28, 2025 1:05 pm

Set the parameter value to the faster client response first. If no luck, use haSyncPriority and set the value 5.
How is the storage connected to CVM? Are they QUEMU, RDM, or pass through devices?
dwma
Posts: 12
Joined: Thu Aug 21, 2025 12:57 pm

Mon Sep 01, 2025 5:18 am

yaroslav (staff) wrote:
Thu Aug 28, 2025 1:05 pm
Set the parameter value to the faster client response first. If no luck, use haSyncPriority and set the value 5.
How is the storage connected to CVM? Are they QUEMU, RDM, or pass through devices?
Changed priority to "Faster client request processing" - not helped.
I cannot see an option to set haSyncPriority to some numerical value through web gui.

Disks to CVM are passed through. Used command below on each proxmox node to each own CVM (for all 6 local disks):
qm set 901 --scsi1 /dev/disk/by-id/wwn-0x5000c5005fxxxx
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Sep 01, 2025 5:23 am

ZFS is generally not recommended for virtualization workloads. Could you please let me know what those disks are and what ZFS configuration you have?
Try setting replication to 5% with the script.
dwma
Posts: 12
Joined: Thu Aug 21, 2025 12:57 pm

Mon Sep 01, 2025 5:28 am

yaroslav (staff) wrote:
Mon Sep 01, 2025 5:23 am
ZFS is generally not recommended for virtualization workloads. Could you please let me know what those disks are and what ZFS configuration you have?
Try setting replication to 5% with the script.
I cannot passthrough the P440 Controller in HBA mode, because proxmox boot disks are also connected to the same backplane/controller. These are 6x 2.5" SAS 600G 10K disks.

How to set 5% replication though a script? Which script I must edit?
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Sep 01, 2025 5:31 am

yaroslav (staff) wrote:
Thu Aug 28, 2025 1:05 pm
Set the parameter value to the faster client response first. If no luck, use haSyncPriority and set the value 5.
You can create a physical RAID and use something like raw-device-mapping (i.e., map the VD device to CVM in Proxmox).
dwma
Posts: 12
Joined: Thu Aug 21, 2025 12:57 pm

Mon Sep 01, 2025 5:35 am

yaroslav (staff) wrote:
Mon Sep 01, 2025 5:31 am
yaroslav (staff) wrote:
Thu Aug 28, 2025 1:05 pm
Set the parameter value to the faster client response first. If no luck, use haSyncPriority and set the value 5.
You can create a physical RAID and use something like raw-device-mapping (i.e., map the VD device to CVM in Proxmox).
This is not an option in this case, because I have to rebuild whole PVE cluster - boot disks are on the same controller as the CVM disks. For the option / script - where I can find it so I'll try with slower sync.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Sep 01, 2025 5:42 am

Just to make sure I am reading it right. You have a VD for the boot volume and a set of disks that you are passing through into CVM to create ZFS.
Please let me know what disks you use: are they HDD or SSD?

What I am suggesting is to destroy ZFS configuration on one side (i.e., remove the replication partner > dismantle the storage configuration) > Create a virtual disk on your RAID controller and present it as raw device (see more at https://forum.proxmox.com/threads/does- ... sk.158096/ it is not pass through) > set up the volume inside StarWind CVM > start replication to it.

If none of those are possible, please use MDADM instead of ZFS.

For scripts, I believe that you have the Windows-based Management Console. Go to C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell on the Windows host you have the console installed.
dwma
Posts: 12
Joined: Thu Aug 21, 2025 12:57 pm

Mon Sep 01, 2025 6:34 am

Code: Select all

P440 (HBA MODE) -> Backplane 	-> 2x "RAW" disks used for Proxmox  -> ZFS Mirror
P440 (HBA MODE) -> Backplane 	-> 6x "RAW" disks passed through to the CVM
All these are HDDs.

And no, I'm not using any windows based management console for vSAN - I only use CVM GUI to manage. Where I can & download the Starwind windows console, because it wasn't bundled with the Proxmox package?
Post Reply