Page 1 of 1

Long time initial sync

Posted: Thu Sep 30, 2021 3:11 pm
by jdeshin
Hello all!
I've created 3TB HA_3 image in my test environment and initial sync time was about two hours. I plan to use 16 TB images and in this case the initial sync time will about 10 hours. My test environment have a two 100 Gbit/s NICs for sync and I use iSer. The disk throughput was about 450-500 MB/s during sync and queue was about 0.1.
How can I improve sync performance and what about resync time after a disk failure?

Additionally, I have two hb/iscsi links in different subnets and when I add two iscsi target portals (two links) i have only links for first portal.
How should I configure two links from the same host for redundancy?

Best regards,
Yury

Re: Long time initial sync

Posted: Fri Oct 01, 2021 1:48 pm
by yaroslav (staff)
Hi,

Extending HA device does not need Full synchronization. Full synchronization will happen only in case of both hosts' shut down.
I also want to mention that resynchronization after disk failure will be not needed if RAID array survives.
Additionally, I have two hb/iscsi links in different subnets and when I add two iscsi target portals (two links) i have only links for first portal.
Could you kindly elaborate on this? Seeing a screenshot will be just great.

Re: Long time initial sync

Posted: Fri Oct 01, 2021 4:00 pm
by jdeshin
Hi Yaroslav!
I've iscsi/sync infrastructure like this:
nodes.png
nodes.png (3.68 KiB) Viewed 13698 times
The only difference, that I have three node (10.1.1.3 and 10.1.2.3)
You can see my list of targets below:
targets.png
targets.png (6.02 KiB) Viewed 13698 times
and my links:
links.png
links.png (11.16 KiB) Viewed 13698 times
As you can see, host 3 have only one link for each image, but has two iscsi network links for redundancy
Extending HA device does not need Full synchronization. Full synchronization will happen only in case of both hosts' shut down.
I also want to mention that resynchronization after disk failure will be not needed if RAID array survives.
I want to use storage spaces simple volume without hardware RAID, because I have three node and three copies of data. Therefore I need to reduce resync time after disk failure. Is it possible? May be you have any extended configuration parameters etc?

Best regards,
Yury

Re: Long time initial sync

Posted: Fri Oct 01, 2021 10:36 pm
by yaroslav (staff)
I've iscsi/sync infrastructure like this:
Do iSCSI and Synchronization go over the same channels? Your diagram does not meet our recommendation on dedicated iSCSI and Synchronization links.
I want to use storage spaces simple volume without hardware RAID
Software RAID is always slower than hardware RAID. Furthermore, since it is just an entity inside the OS there are more stability challenges. That all comes for any software RAID by its design. True, for some software RAIDs recovering data after RAID failure is easier than with hardware RAID (e.g., if the controller itself fails it is harder to rescue data).
I would recommend a hardware RAID for performance-focused deployments. If hardware RAID is not possible, please create StarWind HA devices on individual disks (i.e., with no server-scale redundancy).

Re: Long time initial sync

Posted: Sat Oct 02, 2021 5:48 am
by jdeshin
Dear Yaroslav,
Do iSCSI and Synchronization go over the same channels?
No, I have two separate dedicated 25 Gbit/s links for iscsi/hb.
If hardware RAID is not possible, please create StarWind HA devices on individual disks (i.e., with no server-scale redundancy)
In this case the performance of StarWind image will restricted a one disk performance :(

Best regards,
Yury

Re: Long time initial sync

Posted: Mon Oct 04, 2021 4:23 am
by jdeshin
Dear Yaroslav,

I've tried to place HA image to individual disk (Micron 5200 MAX 480 GB) and sync time was about one hour. The sync bandwidth was restricted about 127 MB/s with 1MB sync blocks. So, for 1.92 TB disk, the sync time was about 4 hours and for a 3.84TB disk - 8 hours.
Could you please explain how does sync algorithm works in common cases(or it's a black box and you don't give any information about it)?
Are there no way to improve sync performance?

With best regards,
Yury

Re: Long time initial sync

Posted: Tue Oct 05, 2021 4:54 am
by yaroslav (staff)
Thank you for your update. Synchronization runs in 256K Sequential blocks. Could you benchmark individual disks and ones in the software RAID under 256K sequential workloads, please?

Re: Long time initial sync

Posted: Tue Oct 05, 2021 1:39 pm
by jdeshin
Synchronization runs in 256K Sequential blocks.
It's amazing because my windows counter Avg bytes/Write show me 1MB. Is that block size used for creation HA image?

I'll try to make performance tests with sequential IO 256 KB, but at this time my results are - random IO 128K is about 390 MB/s for individual disk and 3000 MB/s for storage space with 8 disks and for storage space sync io about 250MB/s.

What is the option syncsecssioncount in the CreateHA_3 script?

Best regards,
Yury

Re: Long time initial sync

Posted: Tue Oct 05, 2021 2:59 pm
by jdeshin
Additionally some results:
HA volume type - 3-way mirror
HA volume size - 3TB
sync time - 4h
Network activity RDMA between hosts - ~0 bit/sec
Network acivity non-RDMA between hosts- ~48 KBytes/sec
Avg disk bytes/write - 1048576
Writes/sec - 237
Avg disk sec/write - 1ms
disk bytes/sec - 248 723 435
Avg disk queue length - 0.238
CPU % - 0,082

Best regards,
Yury

Re: Long time initial sync

Posted: Fri Oct 08, 2021 10:39 am
by yaroslav (staff)
Did you try creating a 2-way mirror? What you can do is wait for it to fully synchronized and cause full sync manually (i.e., stopping both services and starting them).
Let me know if that provides you with different readings.