Nvme-oF NIC's

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
mafigo74
Posts: 5
Joined: Mon Apr 13, 2020 10:08 am

Mon Apr 13, 2020 11:05 am

Hi,

Currently we have two Supermicro SuperServer 1029U-TN12RV on order with dual Mellanox ConnectX®-4 Lx EN 25Gbe nics and 6 3.2tb nvme drives on each, has we were thinking of doing storage replica or maybe s2d.

Recently i saw your product and I think its very interesting, and will definitely try it before deciding witch path to go.

One thing that caught my eye was the nvme-of implementation, I know its still experimental, and it got me thinking if your implementation will be able to take advantage of NVMe over Fabric offloads, supported on Mellanox ConnectX-5 EN cards. If so I will try to future proof my servers and see if I can still change the order so I can go for the ConnectX-5 EN model, because its only about 100$ more on each card.
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Apr 13, 2020 11:42 am

Hello,

NVMe-oF implementation worked in our Hyper-V cluster just great. I guess you have already seen this report https://www.starwindsoftware.com/nvme-o ... me-cluster. By the way, we also used Supermicro servers in that study but 2029UZ-TR4+ ones. In general, NVMe-oF works on RDMA-capable NICs (those ConnectX4 are).
Here are the guides https://www.starwindsoftware.com/resour ... initiator/ and https://www.starwindsoftware.com/resour ... g-nvme-of/ for setting up the initiator and target respectively. Please note that NVMe-oF implementation is still for testing purposes (see the general testing methodology here https://www.starwindsoftware.com/best-p ... practices/ and in the report I shared above). That means that we do not recommend it for production yet.

You can apply for the call with one of StarWind PreSales Engineers https://www.starwindsoftware.com/contact-us I guess. Tech guys will answer your questions during the call and carry out the tests with you if needed.

P.S. Not sure if S2D can already work over NVMe-oF.
mafigo74
Posts: 5
Joined: Mon Apr 13, 2020 10:08 am

Mon Apr 13, 2020 1:37 pm

Hello,

I know that nvme-of works over rdma, my question was specifically if your target driver would take advantage of a nic that does nvme-of offloads, because of the impact it has on performance an cpu usage.

In the attached images from Mellanox its clear the advantages of ofloading, cpu usage is almost 0 and you get 0% 2.5x performance on 4K block size .
Mellanox nvme-of3.png
Mellanox nvme-of3.png (92.05 KiB) Viewed 5216 times
Mellanox nvme-of4.png
Mellanox nvme-of4.png (97.17 KiB) Viewed 5216 times
P.S. Not sure if S2D can already work over NVMe-oF.
As far as know Microsoft as no support for nvme-of, so s2d doesn't too.

Now the more interesting question is if we can use your driver in vsan on a hyperconverged scenario, I supose this one of your goals right?
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Apr 13, 2020 2:10 pm

Yes, the driver works for a hyperconverged scenario. The study I shared with you was held in a hyperconverged environment (this is a proof).
The driver should take advantage of NIC that does NVMe-oF offloads. The thing is that I am not sure if 25 Gbits is enough for such a powerful storage.
mafigo74
Posts: 5
Joined: Mon Apr 13, 2020 10:08 am

Mon Apr 13, 2020 5:30 pm

Hi,
The thing is that I am sure if 25 Gbits is enough for such a powerful storage.
You are absolutely right, but the gain here wouldn't be bandwidth or iops, for that I would add another dual port card, the gain for me would be less cpu utilization that in turn could be used for compute instead of storage.

Idealy I would use dual port 100G Connect-X5 but it is a Pcie3.0x16 card, and the server only has one x16 slot and one x8, so I have to choose between card redundancy with 2 dual port 25Gbe cards or performance with one dual port 100Gbe card. With only one card I get one single point of failure for the storage, or is there a way to use one of the 25Gbe ports as a backup for the storage? In normal conditions vsa would use the dual 100Gbe for the storage, but in case of hardware failure could it be configured to fail back to one port of the 25Gbe card?
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Apr 13, 2020 6:26 pm

Hi,
vsa
Do you plan to build a Windows cluster or ESXi one? Not sure if ESXi supports NVMe-oF stuff.

Well, we do not recommend any teaming on iSCSI and Sync. So 1xiSCSI and 1xSync should be good.
With only one card I get one single point of failure
Absolutely right!
What you can do is using a dual-port 100 GBE NIC (1 port for iSCSI and the other for Sync) with some 1 GBE on the other card for management. 2 cards, network redundancy is somehow achieved. If one of NICs breaks, you always have a backup one and the cluster will be working (slow). Learn more about the heartbeat failover strategy https://www.starwindsoftware.com/help/H ... ategy.html.

One more time, StarWind's implementation of NVMe-oF works as an experimental feature. Spending a large sum of money for a pumped setup working in experimental mode is a bit risky I'd say.
You still can do POC with one of our PreSales engineers before deploying StarWind.
mafigo74
Posts: 5
Joined: Mon Apr 13, 2020 10:08 am

Mon Apr 13, 2020 11:06 pm

Hi,
Do you plan to build a Windows cluster or ESXi one?
I'm planing a hyper-v cluster.
Spending a large sum of money for a pumped setup working in experimental mode is a bit risky I'd say.
I think it makes sense and I'll try to make this change, if I still can. Changing from X4 to the X5 is less than 400$ for all the nics , and if this works great, if it don't well you never know what the future will bring.
You still can do POC with one of our PreSales engineers before deploying StarWind.
I should be getting the servers by the end of this month, if corona doesn't mess things even more, and then I will arrange a session to do some test and see.
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Tue Apr 14, 2020 7:14 am

Hi,

Hope to hear you at our demo soon!
Take care stay healthy.
mafigo74
Posts: 5
Joined: Mon Apr 13, 2020 10:08 am

Fri May 15, 2020 2:42 pm

yaroslav (staff) wrote:Hope to hear you at our demo soon!
Finally the servers have arrived:

Supermicro 1029U-TN12RV
Dual Xeon 6248R
394GB RAM
2x Micron 2200 256GB NVMe
6x Micron 9300 MAX3.2TB NVMe
3x Dual port (6x 25G ports in total) Mellanox Connect X4 Lx

I already have a Starwind demo license, and now I'm trying to figure out the best network configuration for Starwind and also the cluster, because our switches are still 10G (Ciscos) I was thinking of dedicating 2 ports from different cards ( for redundancy) for Starwind, 2 ports also from different cards for cluster and inter-vm traffic at 25G, and 2 ports for traffic to external hosts/clients at 10G from the ciscos,, what do you think about this configuration? Or do you think I should dedicate more ports to Starwind? and if so can I use some o those ports for cluster traffic also?
Michael (staff)
Staff
Posts: 317
Joined: Thu Jul 21, 2016 10:16 am

Sat May 16, 2020 5:22 am

Hello mafigo74,
2 segregated interfaces for StarWind VSAN should be enough. Since you are using the trial version, I would recommend contacting your account manager and schedule a remote session with our engineers to review the configuration and apply the best settings - it's free of charge!
Post Reply