Hi Folks, I am connected to NVMe-oF devices from a remote enterprise array via RoCE fabric switches (Arista), have applied what I believe are Mellanox best practices for lossless RoCEv2 on Windows, and have configured MPIO.
However, the minimum latency I am able to achieve (1T/QD1 reads of 4k) is in the >250us range, which is >4x higher than the same server/same adapters running Linux. Similarly, the max 4k read ops I can achieve is ~2.6M, vs the 5.4M on Linux.
I know not to expect the same performance, but 2-4x performance gap is too wide to accept. I have seen reviews about the StarWind initiator where reported latencies are sub 20us, so I have to assume there is a set of known best practice settings/configs that I haven't applied yet which could reduce the performance gap.
Windows settings, NIC config, Starwind options...I'm all ears for any best practice recipies you folks have followed. Let me know, thanks!
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software