Best practice for NVME based system

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
softmaster
Posts: 24
Joined: Sat Jan 07, 2017 10:30 pm

Mon Jan 06, 2025 11:44 am

Hi
I installed StarWind appliance system on server with 10 NVME disks and dual Mellanox 100GB/s cards. I created raw pools and configured access via NVME over TCP.
I got the next problem. 1 physical disks pool provides practically maximum available performance - 3GB/s read, 1.5GB/s write for sequence access and about 160000 IOPS for random read/write. But any my attempts to build any kind of RAID or ZFS pool shows poor performance.
RAID5 of 10 disks showed worse performance than a single disk. Even RAID0 of 3 disks showed performance slightly worse than a single disk in sequence recording and almost the same in other tests.
And this is not a problem of communication channels or the performance of the server itself. Simultaneous testing of two pools on one physical disk each gives almost full performance of each disk and almost double the total performance. But combining several disks into one pool by any proposed method significantly worsens the performance of the system.
I understand that it is not your system that builds the RAID and ZFS. You use standard system tools. But do you have any real successful experience of building a system based on NVMe disks? What are the best practices?
Regards
yaroslav (staff)
Staff
Posts: 3220
Joined: Mon Nov 18, 2019 11:11 am

Mon Jan 06, 2025 12:09 pm

Hi,

Could you please let me know what the performance of the block device looks like after the RAID initialization is completed?
softmaster
Posts: 24
Joined: Sat Jan 07, 2017 10:30 pm

Mon Jan 06, 2025 12:17 pm

Single disk:
[Read]
SEQ 1MiB (Q= 8, T= 1): 3321 MB/s
RND 4KiB (Q= 32, T=16): 156732IOPS]
[Write]
SEQ 1MiB (Q= 8, T= 1): 1413 MB/s
RND 4KiB (Q= 32, T=16): 165087.6 IOPS


RAID0 3 disks:
[Read]
SEQ 1MiB (Q= 8, T= 1): 3631 MB/s
RND 4KiB (Q= 32, T=16): 149748 IOPS]
[Write]
SEQ 1MiB (Q= 8, T= 1): 1123 MB/s
RND 4KiB (Q= 32, T=16): 159210 IOPS

RAID5 10 disks
[Read]
SEQ 1MiB (Q= 8, T= 1): 3112 MB/s
RND 4KiB (Q= 32, T=16): 138818 IOPS]
[Write]
SEQ 1MiB (Q= 8, T= 1): 850 MB/s
RND 4KiB (Q= 32, T=16): 127118 IOPS
yaroslav (staff)
Staff
Posts: 3220
Joined: Mon Nov 18, 2019 11:11 am

Mon Jan 06, 2025 12:35 pm

Could you please double-check the MDADM to finish initialization?
Tell me more about the system.
See similar threads
1. https://www.reddit.com/r/linuxquestions ... _low_iops/
2. https://forum.level1techs.com/t/extreme ... 5/200677/6
3. This one points to the methodology https://askubuntu.com/questions/1169873 ... drive-alon.
softmaster
Posts: 24
Joined: Sat Jan 07, 2017 10:30 pm

Mon Jan 06, 2025 1:01 pm

It was finished. Your system doesn't provide possibility to create the LUN till synchronization not finished... I waited about 5 or 6 hours till RAID5 synchronized...
yaroslav (staff)
Staff
Posts: 3220
Joined: Mon Nov 18, 2019 11:11 am

Mon Jan 06, 2025 1:44 pm

Thanks for your update.
Nope. You can create LUN on the system that has RAID synchronization running.
You could try the following tweaks.
Create a file /etc/udev/rules.d/60-md-stripe-cache.rules
Add the following contents
SUBSYSTEM=="block", KERNEL=="md*", ACTION=="add|change", ATTR{md/group_thread_cnt}="6"
SUBSYSTEM=="block", KERNEL=="md*", ACTION=="add|change", ATTR{md/stripe_cache_size}="512"
Reboot the system.
Could you please also let me know if you are using StarWind highly available NVMe-OF targets in your system?

Please also let me know more about the system. Is it bare-metal or VM-based StarWind VSAN deployment? What are the system settings and hypervisor you use?
Please also review the threads I shared earlier.
softmaster
Posts: 24
Joined: Sat Jan 07, 2017 10:30 pm

Mon Jan 06, 2025 7:12 pm

Thanks. I'll try.
Let me return to my question.
Do you have experience with NVME based system? What performance may I expect?
Regards
E.
yaroslav (staff)
Staff
Posts: 3220
Joined: Mon Nov 18, 2019 11:11 am

Mon Jan 06, 2025 10:46 pm

Yes, I and StarWind techs in general do.
To help you more, I would like to understand whether the target is connected over NVMe-oF or iSCSI.
softmaster
Posts: 24
Joined: Sat Jan 07, 2017 10:30 pm

Tue Jan 07, 2025 7:34 am

Hello
I would like to point out that I really appreciate your response speed and attention to my requests.
Let me describe my setup.
Dell R640 server with two Gold 6140 processors, 768GB RAM.
ESXi 7 installed.
Starwind appliance installed as a VM. Mellanox dual-port Connectx4 100GB card is virtualized and bypassed by the VM as a PCI devices. 10 NVME disks are also bypassed as PCI devices.
Pool created as RAW.
Client is the same ESXi server with NVME-over-TCP software adapter. BTW - I tested iSCSI also.
2 paths configured with 1 IOPS LBB (from my previous tests this provides some additional performance).
By the way, I am also testing other SDS solutions, so if you are interested I can share my results with you.
E.
yaroslav (staff)
Staff
Posts: 3220
Joined: Mon Nov 18, 2019 11:11 am

Tue Jan 07, 2025 9:46 am

Hi,
You are always welcome.
As NVMe-oF is still in beta, we are actively collecting feedback and results. So, I can hardly share any expectations on this system performance. Also, as a part of the beta program, please reach out to us https://www.starwindsoftware.com/support-form so we could work on this system closely. Use this thread and 1263095 as your references.

P.S. Passing through the disks and adapters is a great idea.
vmware
Posts: 35
Joined: Sun Apr 17, 2022 5:08 am

Wed Jan 08, 2025 4:18 am

softmaster wrote:
Mon Jan 06, 2025 7:12 pm
Thanks. I'll try.
Let me return to my question.
Do you have experience with NVME based system? What performance may I expect?
Regards
E.
I know there is a method to build a high-performance NVMe disk array. If you want to test it, you can leave tg information

By the way, how to test nvme of on Starwind? Has a test product been released yet
yaroslav (staff)
Staff
Posts: 3220
Joined: Mon Nov 18, 2019 11:11 am

Wed Jan 08, 2025 8:15 am

By the way, how to test nvme of on Starwind? Has a test product been released yet
Oh, now I see. There is a misunderstanding.
Earlier, I asked whether you had created a StarWind NVMe-OF target, and you confirmed it. That's why I was under the impression that you managed to get the build from someone in the team as an exclusion. That's why this case was very interesting for me.
The new build is going through its QA cycle and I expect NVMe-oF to be there sometime in March.
I will put you on the Beta waitlist.
Returning to the matter at hand, the RAW volume is not of any use now. StarWind VSAN allows only for iSCSI targets now. They need a regular XFS or EXT4 volume.

Disk performance seems to deteriorate once in MDADM: there are plenty of threads like that.
We can try tweaking the MDADM though. Could you please take a look at the tweaks I shared below?

p.s. It would be nice if we could stick with a single thread (i.e., the support case or here) so it is easier for my colleagues and me to help you.
logitech
Posts: 33
Joined: Sun Feb 04, 2024 9:50 am

Thu Jan 23, 2025 9:58 am

Hello,

I’ve been exploring NVMe-oF for some time now and wanted to know if the paid version of StarWind NVMe-oF is still in beta, or if it’s fully production-ready. Any insights or updates on this would be appreciated!

Thank you.
yaroslav (staff)
Staff
Posts: 3220
Joined: Mon Nov 18, 2019 11:11 am

Thu Jan 23, 2025 10:24 am

Hi,

Sadly, it's still beta (we did not release the build with NVMe-oF capability yet). If you are already enlisted for the beta, we will reach out to you once we have it ready.
Post Reply