Full Synchronization upon brief network interruption

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
YoYo
Posts: 9
Joined: Mon Nov 13, 2017 4:59 pm

Mon Apr 02, 2018 3:33 pm

Is there a way to possibly extend the threshold of time that starwind considers the synchronization partner off line? An example of this would be a brief unplug and re-plug of the sync network cable. I have tried several settings in my lab environment to make this work, however even a brief interruption seems to cause a full re-sync. This won't be acceptable in a production environment..
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Apr 02, 2018 4:11 pm

Well, if you mention something "not acceptable" in a production environment, then it is data corruption that is definitely not acceptable there. Data integrity costs much more than a resynchronization process triggered. Do you agree?
While being a storage solution, StarWind needs to take care of your data and not to expose it to danger. Whenever it considers the situation to be a risky one, which may result in data being not consistent, it starts full synchronization process. Mainly, this is true for setups utilizing L1 cache in write-back mode, but can sometimes be applied to setups with no cache configured.
Can you describe your test scenario that causes full synchronization?
YoYo
Posts: 9
Joined: Mon Nov 13, 2017 4:59 pm

Mon Apr 02, 2018 4:25 pm

Two servers, running

Starwind Version 8.0.0.11818
12GB Virtual Disk, Flat
512 Byte Sector Size
128 MB Write Back Cache

Two Network Cards, 10GbE
1 Dedicated Heartbeat Channel
1 Dedicated Sync Channel

If there is a brief interruption in the Sync channel, the entire 12 GB disk on the unsynchronized node rebuilds. Our goal is to stop iSCSI disk transactions on both nodes during the communication loss of the Sync channel for a configurable amount of time before marking the partner HA node as "unsynchronized" triggering a full rebuild.
YoYo
Posts: 9
Joined: Mon Nov 13, 2017 4:59 pm

Tue Apr 03, 2018 4:40 pm

Any thoughts on this?
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Wed Apr 04, 2018 10:29 am

Could you share your logs collected using StarWind Log Collector? PM me a link to the download.
As for configurable amount of time, for now it defaults to 10 seconds. If you want to change that, in the StarWind Management Console go to the Advanced Settings for the selected node:
Attachments
Screenshot.png
Screenshot.png (13.38 KiB) Viewed 9287 times
YoYo
Posts: 9
Joined: Mon Nov 13, 2017 4:59 pm

Wed Apr 04, 2018 5:00 pm

That setting did not affect the time between network failure, and the "Not Synchronized" condition. Not really sure what that setting actually does. Performed the following test:

Two servers, running

Starwind Version 8.0.0.11818
12GB Virtual Disk, Flat
512 Byte Sector Size
128 MB Write Back Cache

Two Network Cards, 10GbE
1 Dedicated Heartbeat Channel
1 Dedicated Sync Channel

Disconnected the Dedicated Sync Channel NOT the heartbeat Channel. Nodes quickly became "Not Synchronized" (after about 13 seconds). This behavior does not seem to happen when both the heartbeat and the sync channel are simultaneously disconnected. It appeared to wait the configured time before marking nodes "Not Synchronized".

Our goal is to have this configured parameter respected if only the Sync, not the heartbeat channel is disconnected.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Apr 05, 2018 11:29 am

After changing the setting in the picture you need to restart the service for the setting to be applied.
YoYo
Posts: 9
Joined: Mon Nov 13, 2017 4:59 pm

Thu Apr 05, 2018 2:05 pm

We restarted each node (entire computer), and verified each setting was still persistent. The same behavior still occurs.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Apr 06, 2018 12:17 pm

Please PM me the logs collected using StarWind Log Collector https://knowledgebase.starwindsoftware. ... collector/
I will investigate them for a possible reason.
Post Reply