deleting/adding target triggering multiple resyncs

lohelle · Mon Apr 01, 2024 9:55 am

Hi. I recently upgraded Starwind Virtual SAN on my two physical dedicated Supermicro servers from an much older version (like 2017-2018) to version 8.0.15159.0. I had some problems back in the day with unstability, and when I found a stable version I just sticked to that one.

Now after some years I finally upgraded my HA-pair to this version. Usage seems fine, and maybe a little bit quicker. I upgraded mostly because I have a few LUNs on WD Red SA500 4TB SSD's, and they have gotten really slow to write to. So I was hoping for better trim/unmap support to maybe help with garbage collection etc. As part of the troubleshooting I freed up and removed one target on one of the nodes. Doing so triggered a resync for all targets for some reason. After the resync was finished, I removed the target from the second node, and that also triggered a resync of all targets.
I tried recreating a single non-HA device added to a new target a couple of times, and one time all was as normal and one time it triggered a resync again (non-HA device).
I'm now really scared to recreate the full HA device, as it would be quite disasterous if I ended up with both nodes not being syncronized. I never had problems like this on the old version. (the license is for unlimited storage, 2 nodes).

Any advice? Is resync expected like this?

Mon Apr 01, 2024 10:53 am

I had some problems back in the day with unstability, and when I found a stable version I just sticked to that one.

Sad to read that. May I wonder about the build you were using?

So I was hoping for better trim/unmap support to maybe help with garbage collection et

Do you see high background writes hammering your storage? If you do, try the following workaround for vSphere.
1. Inflate ALL your disks.
2. Turn off space reclamation for all hosts https://docs.vmware.com/en/VMware-vSphe ... 3F39A.html

As part of the troubleshooting I freed up and removed one target on one of the nodes. Doing so triggered a resync for all targets for some reason.

Can you please share the logs and provide the timestamp when that happened?

I tried recreating a single non-HA device added to a new target a couple of times, and one time all was as normal and one time it triggered a resync again (non-HA device).

Just to make sure I am reading it right. Did it trigger full synchronization for all HA devices?

lohelle · Mon Apr 01, 2024 11:24 am

I apparently used version 8.0.9996.0 from 2016.

I saw very low usage at the time I did the tests, so shouldn't be the problem.

Should I just attach the log here in the forum?

The events triggered fast-syncs. So the node I did the changes on was the one that needed to sync from the other node. Only took a few minutes, but I don't understand why it happened (multiple times).

But of course, my fear is that creating a HA-device will trigger "unsync" on both sides of the HA-pair, making it much worse and needing to mark one node as "forced synchronized". So I would need to schedule downtime and turn off all connected VMs in my vSphere-environment if that is a risk..

Mon Apr 01, 2024 11:59 am

You have missed a couple of really stable ones, I'd say.

Should I just attach the log here in the forum?

You could email them to support@starwind.com using this thread and 1136764 as your reference.

The events triggered fast-syncs.

If those were fast synchronizations, this can be storage hiccups. Could you tell me more about storage configuration? If StarWind VSAN is running in the VM, please let me know how the storage is connected to the VM.

But of course, my fear is that creating a HA-device will trigger "unsync" on both sides of the HA-pair, making it much worse and needing to mark one node as "forced synchronized". So I would need to schedule downtime and turn off all connected VMs in my vSphere-environment if that is a risk..

Tell me more about the networking in between and the storage configuration. A detailed connectivity scheme will be much appreciated.
And, please share the logs.

lohelle · Mon Apr 01, 2024 12:23 pm

Email sent.

Mon Apr 01, 2024 12:47 pm

Thanks! I will update you here once I review the logs.