Extremely slow sync data rate

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
mpg
Posts: 5
Joined: Sat Nov 30, 2019 6:39 am

Wed Jan 15, 2020 1:42 pm

I am just starting to tinker with setting up VSAN Free via powershell on my 2-node hyper-v server 2019 setup. I have created several "HA Devices" between my two nodes and each time have run into the same problem. Once I create the device, the "sync" uses only 1 interface at most, and seems to operate at a maximum of ~48-56kbs bandwidth.
extremely-slow.png
extremely-slow.png (9.67 KiB) Viewed 4543 times
To outline the setup:
  • I have 2 Dell R710 servers, running Hyper-V 2019 (free).
  • Both have a 4-port Broadcomm built-in network card, as well as a second PCIe Broadcomm 4-port card (identical chips) for a total of 8x 1gig interfaces.
  • The first port of the built-in is for LAN traffic, basically so I can connect to them from LAN.
  • The next 3 ports are for iSCSI/Heartbeat.
  • All 4 ports of the expansion card are for Sync.
  • Each port is in its own subnet between the servers, so port 2 on the integrated is 192.168.242.0/29, .1 for node 1 and .2 for node 2.
[*] Each port is direct connected to the other node, via 1ft cat6 cables, no switches involved.
[*] I have disabled the firewall rules for the "private" network profile and forced the networks to "private", as they come up as "public" by default in windows, not sure if there is something I should be doing differently when setting up direct ethernet links so that they are not "unidentified" and classified as "public" by the OS...
[/list]

I have enabled jumbo frames on all 7 adapters used in this test, and tested/verified them with

Code: Select all

ping -f -l 8972
I set the jumbo frame size to 9014 using powershell:

Code: Select all

Set-NetAdapterAdvancedProperty -RegistryKeyword "*JumboPacket" -RegistryValue 9014

Ping fails on forced fragmenting for any size greater than 8972, I suspect overhead in the packet header(s)? Not sure exactly.

Normal bandwidth seems OK; I have run iperf between the nodes over each connection and getting ~995 mbits/s with 0.001% loss with parameters like:

Code: Select all

iperf3 -c 192.168.242.1 -B 192.168.242.2 -u -b 1G -l 8972
two-images-one-completes-bandwidth-restored.png
two-images-one-completes-bandwidth-restored.png (21.5 KiB) Viewed 4543 times
I initially tried multiple interfaces for both sync and iSCSI/HB on a 400gb image on a SATA drive. Later I tried a 1GB image with only 1 interface for sync, and 1 for iSCSI/HB. Same deal, ~48kbs sync rate. I also tried creating a second image (~2GB) while the 1GB was still syncing but using different interfaces. The traffic WAS split between the two interfaces, but the total bandwidth was the same! Each image sync rate dropped to half = ~24kbs! Once one completed, "full" 48kbs bandwidth was restored to the other image still syncing. Aditionally, there is an "initial" burst of ~2-4mbs once the device is initially created, probably the powershell talking to the "other" swserver to create the other image file I suspect.
two-images-one-completes-bandwidth-restored.png
two-images-one-completes-bandwidth-restored.png (21.5 KiB) Viewed 4543 times
Anyone got an idea of what I should look at next? Or if I there is something obvious I might have missed? I thought the instructions were pretty minimal for using the powershell script for 2-node HA device creation, but there was not much to be confused I thought. A few things I don't know anything about, like "ALUAOptimized" I left to "true", but interfaces, file paths, size/sector size/ etc. were all pretty self-explanatory. I am probably going to uninstall free and re-install using a trial license to see if the management UI gives me any more insight. Thanks in advance to anyone who helps!
Attachments
2-image-sync-bandwidth-halves.png
2-image-sync-bandwidth-halves.png (20.16 KiB) Viewed 4543 times
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Wed Jan 15, 2020 2:49 pm

Several questions appear after reading the explanation from your side.
1. What is the type of storage you use? What is the RAID configuration?
2. Could you show the part of the script where you configured Sync interfaces? I am afraid there could be an issue with that parameter, so we need to make sure it is configured properly.
3. For initial setups, it might be a good idea to create a small device of 1GB, for instance, and then expand it to the required size with the help of the ExtendDevice.ps1 sample script adjusted to match your requirements.
mpg
Posts: 5
Joined: Sat Nov 30, 2019 6:39 am

Wed Jan 15, 2020 10:11 pm

The disk is a single 1TB ~5400RPM (I think) SATA drive in an dell R710, with a "Perc H700" as the raid card, the disk is presented via Raid 0, but is a single disk. I have tried this on a SATA SSD in the same configuration a few weeks back and ran into the same exact issues, though I was just hacking at that point as I was waiting on parts to complete my servers so I didn't think much of it and assumed it was related to my setup. I did try and make some small images, which exhibited the same problem, but obviously completed sync in a more reasonable time frame. Thank you for your response. Here is a snippet of the "parameters" section of the script, values are as I modified it for a 1GB image:

Code: Select all

#common
	$initMethod="Clear",
	$size=1024,
	$sectorSize=4096,
#primary node
	$imagePath="My computer\E",
	$imageName="image3_0A",
	$createImage=$true,
	$storageName="",
	$targetAlias="target3_0A",
	$autoSynch=$true,
	$poolName="",
	$syncSessionCount=1,
	$aluaOptimized=$true,
	$cacheMode="wb",
	$cacheSize=128,
	$syncInterface="#p2=192.168.248.2:3260",
	$hbInterface="#p2=192.168.244.2:3260",
#secondary node
	$imagePath2="My computer\E",
	$imageName2="image3_0B",
	$createImage2=$true,
	$storageName2="",
	$targetAlias2="target3_0B",
	$autoSynch2=$true,
	$poolName2="",
	$syncSessionCount2=1,
	$aluaOptimized2=$true,
	$cacheMode2=$cacheMode,
	$cacheSize2=$cacheSize,
	$syncInterface2="#p1=192.168.248.1:3260",
	$hbInterface2="#p1=192.168.244.1:3260"
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Jan 16, 2020 8:46 am

In your snippet you use only one Sync interface and one heartbeat interface. No wonder the sync operation runs via just one link :)
Try adding the required interfaces as comma separated values:

Code: Select all

$syncInterface="#p2=192.168.248.2:3260,192.168.249.2:3260,192.168.250.2:3260",
Also, you may want to increase the syncSessionCount value to 2 or 3 (this will be on the per-interface basis, i.e. each of the sync interfaces will get the same amount of sync sessions).
mpg
Posts: 5
Joined: Sat Nov 30, 2019 6:39 am

Thu Jan 16, 2020 1:16 pm

Ah that was the script for the single-link HA I did to try and simplify my setup to isolate the problem, I also did do a multi-link HA device with the same format as you described, and got the same results, one link and ~48kbs rate on the link.
My multi-link sync/hb was like (just node 1 for example):

Code: Select all

$syncSessionCount=1,
$aluaOptimized=$true,
$cacheMode="wb",
$cacheSize=128,
$syncInterface="#p2=192.168.248.2:3260,192.168.249.2:3260,192.168.250.2:3260,192.168.251.2:3260",
$hbInterface="#p2=192.168.242.2:3260,192.168.243.2:3260,192.168.244.2:3260",
I don't really know what sync session count does, or what a "session" is, should the count be equal the interface count, or something else? In all my attempts I never changed it from 1. This snippet from this article (link) was all I came across:
firstNode.SyncSessionCount; Synchronization session count. Make sure you set the value of the variable to “1”;
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jan 20, 2020 11:08 am

That parameter sets the number of synchronization sessions that will exist on that particular sync connection. With fast links (i.e. 10Gbps+) you can have more than 1 sync session per link.
Post Reply