Page 1 of 1

"hardware error" w. Starwind NVMe-oF initiator Linux Target

Posted: Mon Mar 29, 2021 1:16 pm
by mattaw
Good morning,

This is probably hardware or ROCEv2 network configuration related, however I am not sure how to debug it further.

Setup:
-------
Linux Debian 10.8, Supermicro Xeon platform
Linux mythmaster 5.10.0-0.bpo.3-amd64 #1 SMP Debian 5.10.13-1~bpo10+1 (2021-02-11) x86_64 GNU/Linux
Kernel drivers for Connectx-5
NVMe-OF target from the kernel.
Connectx-5 CX556A-ECAT latest firmware in x16 PCIe 3 slot

Windows 10 Pro 20H2, AMD Ryzen 3600 X570 platform
Mellanox WinOF-2 2.60
NVMe-OF target the latest from Star Wind
Connectx-5 CX556A-ECAT latest firmware in x4 PCIe 4 slot, using PCIe 3

Network is direct connect to each other, no switch.

Testing:
---------
Star Wind rping runs for minutes on -V verify without error.
Unsure how to correctly use Star Wind rperf to stress the link but it seems to work with all the settings I gave it.
RDMA counters on windows seem to show correct behavior and no dropped RDMA frames, however I am no expert and info is thin on how to diagnose.

Failing Testing software:
----------------------------
ATTO 4.01.0f1, Direct I/O (works without direct I/O)

Runs for several tests and then fails with the following message in windows event log:
Example error: The IO operation at logical block address 0x7835ec28 for Disk 3 (PDO name: \Device\000000aa) failed due to a hardware error.

Linux shows no messages at all, so I assume this is a fault somewhere on the windows side. Thoughts?

Re: "hardware error" w. Starwind NVMe-oF initiator Linux Target

Posted: Mon Mar 29, 2021 3:05 pm
by yaroslav (staff)
Hi,

Please log a call with us by sending an email to support@starwind.com. Use this thread as a reference.

Re: "hardware error" w. Starwind NVMe-oF initiator Linux Target

Posted: Mon Mar 29, 2021 5:37 pm
by mattaw
I appreciate you taking a look at this, I really do. Case opened and logs uploaded.

Matthew

Re: "hardware error" w. Starwind NVMe-oF initiator Linux Target

Posted: Mon Mar 29, 2021 6:03 pm
by yaroslav (staff)
Will need more logs from you. Let us work on this matter in the support case.