NVMe upgrade, Poor Speed

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
chell
Posts: 48
Joined: Mon Dec 11, 2017 1:19 am

Tue Mar 19, 2024 8:32 am

Hi All

I'm getting usual speed test results when replication is enabled. I assume that I have something wrong with my configuration that’s not as noticeable when you’re using mechanical drives.

I have two intel servers with Dual Xeon 4110 silver CPU’s, 192GB Ram and four Micron NVMe U.3 drives. Running Windows server 2016.
Using crystal disk mark. Each Micron NVMe drive reports 3309 MB/s Read and 3486 MB/s Write sequential. 1677 MB/s Read and 1322 MB/s write, 4K Random. The drives can go faster, it’s the servers that are holding them back.

Using an iSCSI to 127.0.0.1 and no cache, the star wind drive runs at about 2/3 of the speed above for sequential and 1/5 for random.
I added another iSCSI session to 127.0.0.1 and got that speed up to 3098 MB/s Read and 1382MB/s Write sequential. 799 MB/s Read and 427 MB/s Write 4K Random, still with no Cache. With write back cache, I got 4028 MB/s Read and 3222MB/s Write sequential. 761 MB/s Read and 446 MB/s write, 4K Random.

The problem is when the star wind drive is converted to a high availability device, the write speed drops through the floor. Using two hosts and a single micron NVMe drive each.
Just using the two iscsi sessions to 127.0.0.1 used in the previous test. The best I could get is 6689 MB/s Read and 628MB/s Write sequential. 1103 MB/s Read and 323 MB/s write, 4K Random using write back cache. I have tried different combinations of cache and iscsi connections to one host or both hosts. In short, the read speed is affected by different configurations, but the write speeds are always in the 300-630MB/s range. I tried using 4 x micron NVME’s on each host in a raid 0. I still got similar results.

To make things worse, I found that adding the drive to windows cluster shared volumes results in the read speed on the node that doesn’t have ownership drops to 1/10 of the speed of the volume owner. Owner node achieves 6689 MB/s Read and 628MB/s Write sequential. Non-Owner node is around 668 MB/s Read and 628MB/s Write sequential. Changing the ownership of the volume results in the new owner reading at the full speed and the former owner dropping to 1/10 speed.

Can anyone shed some light?
Much appreciated.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Tue Mar 19, 2024 10:37 am

First, thanks for your post.
I assume that I have something wrong with my configuration that’s not as noticeable when you’re using mechanical drives.
Nope. There's just iSCSI thing :)
1 session cannot reach the disk performance. For HDDs, it can. For SSDs, you need AT LEAST 2x loopback sessions and 2x partner sessions. To go even further, try different MPIO where one type of connections (e.g., only local or partner) are preferred.
That's why I can't wait for NVMe-oF to get into the StarWind CVM.
Using crystal disk mark. Each Micron NVMe drive reports 3309 MB/s Read and 3486 MB/s Write sequential. 1677 MB/s Read and 1322 MB/s write, 4K Random. The drives can go faster, it’s the servers that are holding them back.
Do you have those disks in the RAID or do you have them exposed to StarWind HA volumes as individual volumes?
First of all, Crystal Disk is not the best software to test, well, anything. Parameters are quite limited, QueueDepth is just as well. Too little knobs to tweak to squeeze maximum performance out of anything.
Try ATTO, FIO, or Diskspeed. See the methodology and best practices https://www.starwindsoftware.com/best-p ... practices/.
To make things worse, I found that adding the drive to windows cluster shared volumes results in the read speed on the node that doesn’t have ownership drops to 1/10 of the speed of the volume owner.
Yes, CSVFS eats up some of the performance, but typically it is up to 20% of the performance. Ownership might also do so, but generally that is impacted by
1. Networks. If they are a bottleneck, CSV ownership impacts performance.
2. Filesystem. From my experience, ReFS kills performance unless it is Storage Spaces. Try NTFS here.
3. Antivirus. Even Defender might be a troublemaker.
Add here the general bottleneck of iSCSI and you will get the system where performance is not good.

Let me know if any of those helps.
Post Reply