Unable to discover targets using TCP using nvme-of initiator

Initiator (iSCSI, FCoE, AoE, iSER and NVMe over Fabrics), iSCSI accelerator and RAM disk
stan
Posts: 5
Joined: Thu Sep 04, 2025 9:08 am

Thu Sep 04, 2025 9:16 am

I've been unable to setup a connection using the tcp protocol. I've managed it using RDMA but would like to try out nvme-of over tcp as well.

The target is an ubuntu server running

Code: Select all

nvme discover
shows it as being up.
However configuring the windows client using the GUI of the initiator throws an error. The following can be found in the log.

Code: Select all

2025-09-04 09:29:39.8201 00014 INFO     | [CliCmdTool]: Discovering of targets started 
2025-09-04 09:29:39.8201 00014 INFO     | [CliCmdTool]: Adapter ''. Discovering targets on '1.1.1.1:8009' from local address '1.1.1.2' 
2025-09-04 09:29:54.8403 00014 ERROR    | [CliCmdTool]: Failed to discover targets. Error: Shell command 'StarNVMeoF_Ctrl.exe -ttcp -j discovery 1.1.1.1:8009 1.1.1.2 nqn.2008-08.com.starwind:vertigo-dsk-015' failed due to timeout '15000'. 
2025-09-04 09:29:54.8403 00014 ERROR    | [VmPortals]: Failed to discover on portal : 'ICliCmdTool' error 
I've tried running the discovery via the cli but in that case the command prompt simply hangs.
The windows client is on Windows 10 (ver 1909) (Build 18363.2274)
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Thu Sep 04, 2025 9:47 am

The issue looks to be somewhere on the network side of the setup.
Try using private IPs, please. Please make sure that there is no firewall and MTU misalignment. Please also make sure that the target is capable of NVMe-oF/TCP
See the sample commands here https://www.starwindsoftware.com/resour ... 1a21b64f13.
stan
Posts: 5
Joined: Thu Sep 04, 2025 9:08 am

Wed Sep 24, 2025 12:12 pm

Hi, thanks for the quick reply. It has been a bit since I worked on this.

The ips are local. As a test I connected the NICs directly without a switch in between.
I checked the MTU both the ubuntu server (target) and windows host (initiator) have it set to 1500 on the adapters.
Additionally to test I disabled the firewall on both machines.

The link with sample command you provided seems to be for windows server. The host is running Windows Pro for workstations not windows server.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Wed Sep 24, 2025 1:31 pm

Thanks for your input. Please try with the target NQN.
Also make sure that the target allows for comms with initiator nqn.
stan
Posts: 5
Joined: Thu Sep 04, 2025 9:08 am

Wed Sep 24, 2025 2:08 pm

I don't quite understand what you mean by `Please try with the target NQN`, could you elaborate?

On the target I've set it to allow any host by using

Code: Select all

 echo 1 | sudo tee /sys/kernel/config/nvmet/subsystems/$SUBSYSTEM_NAME/attr_allow_any_host
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Wed Sep 24, 2025 7:33 pm

Thanks for your update.
Please try using target NQN for discovery. The output looks like you are using the initiator nqn.
stan
Posts: 5
Joined: Thu Sep 04, 2025 9:08 am

Thu Sep 25, 2025 10:02 am

Hi, thanks for explaining.

I've been trying to this using the command line now. For discovery there does not seem to be a flag to pass the target NQN.

Code: Select all

StarNVMeoF_Ctrl discovery <target_addr[:port]> <local_addr> [<HostNQN> [<queueDepth>]]
using either the discovery or insert command hangs the command line tool. Any attempts to run commands in a new console also hang. If I reboot the pc commands such as StarNVMeoF_Ctrl status work fine. After trying to run either the discovery or insert command the status command also gets stuck.

I've tried to run the following insert and discover commands

StarNVMeoF_Ctrl.exe -ttcp insert 1.1.1.1:4420 1.1.1.2 testnqn
StarNVMeoF_Ctrl.exe -ttcp discovery 1.1.1.1:4420 1.1.1.2

I know the ip address and subNQN name are correct because discovery works locally on the target this is the output I get on the target

Code: Select all

trtype:  tcp
adrfam:  ipv4
subtype: nvme subsystem
treq:    not specified, sq flow control disable supported
portid:  1
trsvcid: 4420
subnqn:  testnqn
traddr:  1.1.1.1
eflags:  none
sectype: none
from the initiator host (1.1.1.2) I can ping the target (1.1.1.1)
I've also verified the target is listening on the correct port (4420)
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Thu Sep 25, 2025 11:53 am

Thanks for checking.
Can you please tripple-check the MTU to be 1500 (1514 for Windows) and see the drivers and NIC FW to be up-to-date?
stan
Posts: 5
Joined: Thu Sep 04, 2025 9:08 am

Fri Sep 26, 2025 7:45 am

linux gives the following

Code: Select all

4: enp2s0d1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 9c:dc:71:40:cd:01 brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.1/24 scope global enp2s0d1
       valid_lft forever preferred_lft forever
    inet6 fe80::9edc:71ff:fe40:cd01/64 scope link
       valid_lft forever preferred_lft forever
As an additional test I uninstalled windows and installed linux on the initiator machine. Using linux the machine can connect without any issues using nvme-of over tcp. So I think it's safe to conclude that the target is not the issue here.

I then swapped back to windows 10 Pro 22H2 where the initiator once again failed.

On windows I ran netsh int ip show int

This gave me back an MTU of 1500 for all interfaces except the Loopback Pseudo-Interface

I additionally updated the drivers on the windows host to latest (from 2020 its an older connectx3 card)

On windows I also used PortQry to see if the remote port was reachable. and it reported the target port (4420) as LISTENING. Pinging the target ip also works just fine.

I had this working with nvme-of over RDMA which worked great but when trying tcp it is failing.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Mon Sep 29, 2025 3:37 pm

We are working on this case internally. I will keep you posted.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Thu Oct 16, 2025 8:25 am

The fix should come in the future build. Please keep an eye on the release notes https://www.starwindsoftware.com/release-notes-build
crypope
Posts: 1
Joined: Thu Nov 06, 2025 10:08 pm

Thu Nov 06, 2025 11:39 pm

I have exactly same problem and tried everything as instructed in this forum with same results. The discovery on Windows 2025 & 2019 for nvme/tcp get stuck and failing. Can I get notified as well once the new build that fix this is in? Having access to more verbose logging may help as well but not sure how to access those. Thanks.
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Fri Nov 07, 2025 6:52 am

Verbose logging is in place, yet you need a debug build for that. We don't supply a debug build without management approval, though.
Sadly, we don't notify users via email of the releases. You can keep an eye on the release notes yourself
From what I know, the issue was successfully fixed in the next build. Thanks for your cooperation.
frollo
Posts: 4
Joined: Wed Nov 26, 2025 5:43 pm

Wed Nov 26, 2025 6:24 pm

I too have this issue. I'm trying to do NVME-OF from Truenas Scale. Endpoints are available and working however when attempting to connect via tcp there appears to be a handshake error. According to AI this is a known issue

https://www.reddit.com/r/truenas/commen ... vmeofroce/
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Wed Nov 26, 2025 7:14 pm

Welcome to StarWind Forum.
Does connectivity over RDMA work well? Does the older build I shared work?
Are there any CHAP settings in place or some Access rights settings on the target side? In other words, is there any list of allowed initiators on the target side?
Post Reply