Slower performance on primary node

seb_alf · Fri Aug 25, 2023 2:37 pm

Hi,

we setup starwind vsan free in our lab.
We set up two server 2022, installed starwind vsan free onto them and setup a failovercluster.
Everything works really well except some performance anomaly we cant nail down.

We noticed having reduced performance on the primary node accessing the csvfs.
Write speeds in virtual containers and copying big files into the cluster volume, directly from the primary node, are way slower than on node 2. (unstable 30-240MB/s compared to stable 220-240MB/s)
Drive benchmark in VMs and generel performance also reflect this behavior.

We did test 3 different clusters with this, using different server plattforms and all did show this problem.
1. Ryzen 5 3600 using 3.0 NVMEs
2. Epyc 7252
3. Xeon E-2324G
2 & 3 used a 6G SATA Raid 1, on write through cached Adaptec SmartRAID 3102E-8i 2GB

-Using Version 8.0.15020.0 & 8.0.15159.0

We used different hardware on all those setups except using the same network card models:
-Broadcom 25Gb 2-Port PCIe Adapter P225P
-Using 25g direct attached cables
-We tried swapping those with different cards of the same model

The Sync between the nodes seem fine, its just node-1 accessing its cluster storage slower.
This issue even persists if we shutdown node-2.
We also set this up using both 25g ports on those cards as sync interfaces, it did split the load on both card but it didn't changed the behavior.

-We set all of those nodes to high performance mode in windows
-MPIO is installed and setup using last queue depth for storage
-Both iscsi initiators are connected with 127.0.0.1 and the partner sync interface.

We used this example to setup those HA-devices, Sync and HB are on a /29 subnet.

Code: Select all

param($addr="10.131.83.1", $port=3261, $user="root", $password="starwind",
	$addr2="10.131.83.2", $port2=$port, $user2=$user, $password2=$password,
#common
	$initMethod="Clear",
	$size=12,
	$sectorSize=512,
	$failover=0,
	$bmpType=1,
	$bmpStrategy=0,
#primary node
	$imagePath="My computer\D\starwind",
	$imageName="storage-image-n1",
	$createImage=$true,
	$storageName="",
	$targetAlias="storage-target-n1",
	$autoSynch=$true,
	$poolName="pool1",
	$syncSessionCount=1,
	$aluaOptimized=$true,
	$cacheMode="wb",
	$cacheSize=128,
	$syncInterface="#p2=172.16.32.10:3260" -f $addr2,
	$hbInterface="#p2=172.16.32.2:3260",
	$createTarget=$true,
	$bmpFolderPath="",
#secondary node
	$imagePath2="My computer\D\starwind",
	$imageName2="storage-image-n2",
	$createImage2=$true,
	$storageName2="",
	$targetAlias2="storage-target-n2",
	$autoSynch2=$true,
	$poolName2="pool1",
	$syncSessionCount2=1,
	$aluaOptimized2=$false,
	$cacheMode2=$cacheMode,
	$cacheSize2=$cacheSize,
	$syncInterface2="#p1=172.16.32.9:3260" -f $addr,
	$hbInterface2="#p1=172.16.32.1:3260",
	$createTarget2=$true,
	$bmpFolderPath2=""
	)
	
Import-Module StarWindX

try
{
	Enable-SWXLog -level SW_LOG_LEVEL_DEBUG

	$server = New-SWServer -host $addr -port $port -user $user -password $password

	$server.Connect()

	$firstNode = new-Object Node

	$firstNode.HostName = $addr
	$firstNode.HostPort = $port
	$firstNode.Login = $user
	$firstNode.Password = $password
	$firstNode.ImagePath = $imagePath
	$firstNode.ImageName = $imageName
	$firstNode.Size = $size
	$firstNode.CreateImage = $createImage
	$firstNode.StorageName = $storageName
	$firstNode.TargetAlias = $targetAlias
	$firstNode.AutoSynch = $autoSynch
	$firstNode.SyncInterface = $syncInterface
	$firstNode.HBInterface = $hbInterface
	$firstNode.PoolName = $poolName
	$firstNode.SyncSessionCount = $syncSessionCount
	$firstNode.ALUAOptimized = $aluaOptimized
	$firstNode.CacheMode = $cacheMode
	$firstNode.CacheSize = $cacheSize
	$firstNode.FailoverStrategy = $failover
	$firstNode.CreateTarget = $createTarget
	$firstNode.BitmapStoreType = $bmpType
	$firstNode.BitmapStrategy = $bmpStrategy
	$firstNode.BitmapFolderPath = $bmpFolderPath
    
	#
	# device sector size. Possible values: 512 or 4096(May be incompatible with some clients!) bytes. 
	#
	$firstNode.SectorSize = $sectorSize
    
	$secondNode = new-Object Node

	$secondNode.HostName = $addr2
	$secondNode.HostPort = $port2
	$secondNode.Login = $user2
	$secondNode.Password = $password2
	$secondNode.ImagePath = $imagePath2
	$secondNode.ImageName = $imageName2
	$secondNode.CreateImage = $createImage2
	$secondNode.StorageName = $storageName2
	$secondNode.TargetAlias = $targetAlias2
	$secondNode.AutoSynch = $autoSynch2
	$secondNode.SyncInterface = $syncInterface2
	$secondNode.HBInterface = $hbInterface2
	$secondNode.SyncSessionCount = $syncSessionCount2
	$secondNode.ALUAOptimized = $aluaOptimized2
	$secondNode.CacheMode = $cacheMode2
	$secondNode.CacheSize = $cacheSize2
	$secondNode.FailoverStrategy = $failover
	$secondNode.CreateTarget = $createTarget2
	$secondNode.BitmapFolderPath = $bmpFolderPath2
        
	$device = Add-HADevice -server $server -firstNode $firstNode -secondNode $secondNode -initMethod $initMethod
    
	while ($device.SyncStatus -ne [SwHaSyncStatus]::SW_HA_SYNC_STATUS_SYNC)
	{
		$syncPercent = $device.GetPropertyValue("ha_synch_percent")
	        Write-Host "Synchronizing: $($syncPercent)%" -foreground yellow

		Start-Sleep -m 2000

		$device.Refresh()
	}
}
catch
{
	Write-Host $_ -foreground red 
}
finally
{
	$server.Disconnect()
}

Does anyone have an idea what might cause this issue?

Grüße,
Sebastian

Sat Aug 26, 2023 8:08 pm

Hi,

CSV ownership impact on performance is related to Failover Cluster operation.
Here are some tips.
1. How do you test performance? Please see some hits here https://www.starwindsoftware.com/best-p ... practices/.
2. Make sure multiple loopback iSCSI sessions are used. At least 4 loopback sessions, connect partners with at least 2x iSCSI sessions per link. iSCSI sessions can easily get overwhelmed, and multiple iSCSI sessions help improve performance. NOTE: iSCSI eats some of CPU performance.
3. No REFS on underlying storage and StarWind HA devices. Try NTFS, which should address the CSV ownership question.
4. Disable write-back caching. For fast storage, it is not needed. Comment that line out while creating the device or disable it as described here https://knowledgebase.starwindsoftware. ... -l1-cache/.

General tips
1. Make sure redundant networking is used. You must have at least one link (e.g., heartbeat or synchronization) running over a different NIC. See more at https://www.starwindsoftware.com/system-requirements.
2. Make sure the manufacturer's drivers are installed.
3. Based on my experience, Broadcom NICs can misbehave under MTU 9014. Try 1514.

Finally, iSCSI itself is a bottleneck. See more https://www.starwindsoftware.com/starwi ... -of-target.

seb_alf · Mon Aug 28, 2023 1:28 pm

Hi again,

we tested mostly by copying a single big file to the cluster-volume, thats where we noticed it first. One test vhdx took ~1:30 seconds on node-1 compared to ~35 seconds on node-2. If i run crystaldiskmark in a vm i see a similar picture, no matter if its on our nvme array or a sata array. We only get ~2/3 read and ~1/2 write performance.

We discovered its not an issue on node-1, its an issue accessing the loopback storage volume.
Node-2 gets as slow as node-1 as soon as we disconnect the partner volume.
If we disconnect the loopback on node-1 the performance gets up to expected levels.

-CSV and Ownership on the whole cluster isn't changing anything. Performance just drops for some seconds while its transferring.
-The Storage on all systems was ntfs formatted on the drives and the iscsi-volumes.
-Disabling the VSAN write cache didn't resolve it.
-Changing MTU back to 1514 also didn't resolve it.
-We used one onboard NIC for HB and the SFP-Expension card for sync on all those setups to prevent issues in case of a dying pcie card. The OS was always on an onboard nic or an additional network card.

-For some reason, adding more iscsi-sessions reduces the performance alot. Using 4 or more sessions makes the loopback storage barely usable.

We do not really care about the storage performance, as long as we are similar fast as a single 6G Sata SSD.
For Future projects using the vsan-free and the standard edition we would not add jumbo frames or additional Sessions if we don't need them. Thats why i think we would be fine using the microsoft iscsi-initator.

Because i need to remove one of the servers from our lab today, i won't be able to test around that much until next week. I don't want to play around to much on our 10-people-production-setup.

Mon Aug 28, 2023 3:19 pm

Make sure the Windows server is up-to-date. Try local-to-local connection.
Say, your local IP is 172.16.10.10 and your partner IP is 172.16.10.20
In discovery, discover the portal on one node 172.16.10.10 (target) and 172.16.10.10 (initiator) repeat the same for 172.16.10.20 on another node.
Try at least 3 local-to-local, 1 loopback, and 1 partner connection.
Use round-robin with the subset MPIO, where local-to-local connection is set as ACTIVE, while others are as STANDBY.

Furthermore, slow smb copying is a known issue of Windows Server (https://learn.microsoft.com/en-us/windo ... e-transfer). As a workaround for file copying, you can align values for FirstBurstLenght with MaxBurstLenght (regedit -> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e97b-e325-11ce-bfc1-08002be10318}\000X\Parameters) for each node (restart might be needed).
Crystal disk is also not the best tool IMO to test the performance (use disk speed as described).

seb_alf · Wed Sep 06, 2023 3:03 pm

Just want to give an update.

I got two different server for testing.

We believe the issue we are running into might be a faulty mpio configuration.

We selected starwind-vsan service and the loopback driver while installing.

1. To setup mpio we installed the mpio feature.
2. Set in starwind.cfg <iScsiDiscoveryListInterfaces value="1"/>
3. Then we pressed the discover iscsi-devices in mpiocpl. That gives us an ok adds STARWINDSTARWIND and some generic MSFT2005 device to the list of mpio devices.
4. After that we rebooted the system.
5. We press the use multipathing button on all storage iscsi-connections we create.

Additionally,
I can only change mpio policys under disk management those also get changed in the initator gui while doing this.
Changing them directly in the iscsi-initator gui gives an error explaining the policy.
I also cant change the connection in this menu, while using round-robin with the subset, to active or standby.
Under Disk-Management mpio we cant set connections to active or standby.
Is this normal?

Wed Sep 06, 2023 7:16 pm

This is not normal and looks to be some OS-level bug to me. Have you already tried reinstalling Windows Server or installing a previous version (e.g., 2016 or 2019)?
First, try uninstalling and installing the MPIO feature back.

seb_alf · Thu Sep 07, 2023 12:34 pm

We installed Server 2022 Standard with Desktop and core multiple times on multiple systems and we also tested server 2019.

But i can guarantee that we get an error choosing that policy and pressing apply or ok at the iscsicpl menu.
https://youtu.be/azGPzOXRPRE?t=1488

I will reinstall the test cluster using freshly downloaded english versions of windows server.
I will also swap out the broadcom nics with QNAP QXG-10G1T for this test to test different nic models.

Thanks for your patience.

Thu Sep 07, 2023 3:21 pm

Greetings,

Thanks for your update. Can you please try Windows server 2019?
Please keep me posted.

seb_alf · Fri Sep 08, 2023 9:12 am

Another small questions.
Right now we are using the same license file for our testing environment that we used to license our production system.
Can this reinstalling and reusing the key multiple times cause our production system to get licensing issues?

I don't even know if the license can gets revoked and what happens.

Fri Sep 08, 2023 11:37 am

Right now we are using the same license file for our testing environment that we used to license our production system.

This is a violation of the License Agreement, I believe. Users are prohibited from using the same key simultaneously at their setups unless that is outlined in the StarWind VSAN edition. Key transfer implies unregistering licenses from the previous setup where it was used.

seb_alf · Mon Sep 11, 2023 2:17 pm

Just used a starwind vsan trial to setup the two server again using the gui.

On that one i can change the mpio policies without getting any errors.
If i use this powershell script to setup the devices i get the same errors

I copied the script from the fresh installation using the new file i got from the starwind trial mail.
On that script i just changed the storage-, targetname and IPs for the nics.
This time i also didn't removed the #p2={0}:3260, in front of it.

Code: Select all

param($addr="192.168.0.11", $port=3261, $user="root", $password="starwind",
	$addr2="192.168.0.12", $port2=$port, $user2=$user, $password2=$password,
#common
	$initMethod="Clear",
	$size=12,
	$sectorSize=512,
	$failover=0,
	$bmpType=1,
	$bmpStrategy=0,
#primary node
	$imagePath="My computer\D\starwind",
	$imageName="masterImg69",
	$createImage=$true,
	$storageName="",
	$targetAlias="targetha69",
	$autoSynch=$true,
	$poolName="pool1",
	$syncSessionCount=1,
	$aluaOptimized=$true,
	$cacheMode="wb",
	$cacheSize=128,
	$syncInterface="#p2={0}:3260,172.16.32.10:3260,172.16.32.18:3260" -f $addr2,
	$hbInterface="#p2=172.16.32.2:3260",
	$createTarget=$true,
	$bmpFolderPath="",
#secondary node
	$imagePath2="My computer\D\starwind",
	$imageName2="partnerImg69",
	$createImage2=$true,
	$storageName2="",
	$targetAlias2="partnerha69",
	$autoSynch2=$true,
	$poolName2="pool1",
	$syncSessionCount2=1,
	$aluaOptimized2=$false,
	$cacheMode2=$cacheMode,
	$cacheSize2=$cacheSize,
	$syncInterface2="#p2={0}:3260,172.16.32.9:3260,172.16.32.17:3260" -f $addr,
	$hbInterface2="#p2=172.16.32.1:3260",
	$createTarget2=$true,
	$bmpFolderPath2=""
	)
	
Import-Module StarWindX

try
{
	Enable-SWXLog -level SW_LOG_LEVEL_DEBUG

	$server = New-SWServer -host $addr -port $port -user $user -password $password

	$server.Connect()

	$firstNode = new-Object Node

	$firstNode.HostName = $addr
	$firstNode.HostPort = $port
	$firstNode.Login = $user
	$firstNode.Password = $password
	$firstNode.ImagePath = $imagePath
	$firstNode.ImageName = $imageName
	$firstNode.Size = $size
	$firstNode.CreateImage = $createImage
	$firstNode.StorageName = $storageName
	$firstNode.TargetAlias = $targetAlias
	$firstNode.AutoSynch = $autoSynch
	$firstNode.SyncInterface = $syncInterface
	$firstNode.HBInterface = $hbInterface
	$firstNode.PoolName = $poolName
	$firstNode.SyncSessionCount = $syncSessionCount
	$firstNode.ALUAOptimized = $aluaOptimized
	$firstNode.CacheMode = $cacheMode
	$firstNode.CacheSize = $cacheSize
	$firstNode.FailoverStrategy = $failover
	$firstNode.CreateTarget = $createTarget
	$firstNode.BitmapStoreType = $bmpType
	$firstNode.BitmapStrategy = $bmpStrategy
	$firstNode.BitmapFolderPath = $bmpFolderPath
    
	#
	# device sector size. Possible values: 512 or 4096(May be incompatible with some clients!) bytes. 
	#
	$firstNode.SectorSize = $sectorSize
    
	$secondNode = new-Object Node

	$secondNode.HostName = $addr2
	$secondNode.HostPort = $port2
	$secondNode.Login = $user2
	$secondNode.Password = $password2
	$secondNode.ImagePath = $imagePath2
	$secondNode.ImageName = $imageName2
	$secondNode.CreateImage = $createImage2
	$secondNode.StorageName = $storageName2
	$secondNode.TargetAlias = $targetAlias2
	$secondNode.AutoSynch = $autoSynch2
	$secondNode.SyncInterface = $syncInterface2
	$secondNode.HBInterface = $hbInterface2
	$secondNode.SyncSessionCount = $syncSessionCount2
	$secondNode.ALUAOptimized = $aluaOptimized2
	$secondNode.CacheMode = $cacheMode2
	$secondNode.CacheSize = $cacheSize2
	$secondNode.FailoverStrategy = $failover
	$secondNode.CreateTarget = $createTarget2
	$secondNode.BitmapFolderPath = $bmpFolderPath2
        
	$device = Add-HADevice -server $server -firstNode $firstNode -secondNode $secondNode -initMethod $initMethod
    
	while ($device.SyncStatus -ne [SwHaSyncStatus]::SW_HA_SYNC_STATUS_SYNC)
	{
		$syncPercent = $device.GetPropertyValue("ha_synch_percent")
	        Write-Host "Synchronizing: $($syncPercent)%" -foreground yellow

		Start-Sleep -m 2000

		$device.Refresh()
	}
}
catch
{
	Write-Host $_ -foreground red 
}
finally
{
	$server.Disconnect()
}

I will check if the performance has changed.

Mon Sep 11, 2023 2:33 pm

Greetings,

It should not change. Try more session or different MPIO.

seb_alf · Thu Sep 14, 2023 9:07 am

We get the same performance anomaly on the trial version.

We had all this happen on vsan free and trial, with multiple systems, with different configurations and setup by two technicians.
I talked to our contact at starwind, he will get us in contact with a technician next week.

I will update you if we find something.

Summary of our findings:
1. Copying a big file into the vsan is generally quite slow. A raid1 and a raid10 using 6 drives gets around the same copy speed whats weird. The good performance i firstly mentioned on node-2 seemed to be system ram caching.
2. Performance checks with diskspd on the node-1 or a vm on the storage seem fine until we add multiple sessions to connections. That causes the disk performance inside vms to get really bad (below 60iops for random).
3. Adding multiple sessions also causes the file copy to get super slow (0-3MB/s), this changes as we change the power profile of the server from performance to balanced. On balanced it runs as weird as described at 1. on high performance its unusable. Thats why i think its causing some kind of error that gets way worse as soon as we lift the system performance.
4. Using the vsan free and trial powershell scripts for setup on free or trial makes us unable to change the mpio policy properly. The only policy we can set without an error is failover. However that only works if we point the active and standby path to the primary nodes storage. Setting loopback as active on the secondary node causes the same error. The ha functionality works fine, we can power off any o the systems and everything moves or gets recovered on the remaining node.

The performance of all nics (bandwith and latency), raid-storage and cpu is as expected. I confirmed that using Cinebench R23, diskspd and iperf

Thu Sep 14, 2023 11:18 am

Hi,

Glad to know that you were able to get in contact with the tech team. Will be happy happy if I can carry on working with you on this case.
Keep this thread up-to-date.

seb_alf · Mon Sep 25, 2023 9:30 am

After a great call support fixed the performance issue we had with our Trial Cluster.
We accidentally added the Cluster-Role IP on sync 192.168.0.10 instead of 172.16.32.10.
I believe i accidentally clicked that one in the gui because of the 10 at the end.

I also showed him the powershell script that caused the mpio error on trial and free vsan.

There is an error in the template scripts that causes the error.

on primary $aluaOptimized=$true,
on secondary $aluaOptimized2=$false,

Changing both to $true fixes that.

We also fixed that on our already setup system by setting <alua_access_state>0</alua_access_state> inside the HA configuration files of the images on both nodes.
We set that in both blocks HA-Image block and IQN block.

We can now set policies properly and performance seems to be fine now on our free vsan system.