VSAN, ESXi & constant "paths down"

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
p_k
Posts: 3
Joined: Thu Aug 15, 2019 6:38 pm

Thu Aug 15, 2019 7:12 pm

Hi,

today I've requested and received a NFR license for my HomeLab from the friendly and helpful marketing/sales employees. The last recent hours I was going through the installation and configuration procedures and so far everything worked great: Having a single node Starwinds VSAN node so far. I've now mounted the iSCSI disk on my ESXi host, created a new little VM and started the Ubuntu netinstall in there. But unfortunately since the start of the installation I'm getting continous "paths down" failures from the iSCSI initiator - for some reasons the connection isn't absolutely unstable, even the fact the VM is on the same host (using an own vSwitch for communication).

I was able to get following logs from /var/StarWind/StarWindVSA/logs/*log:

Code: Select all

8/15 18:33:55.773 90 T[1c,b8b]: iScsiTask::handleTaskMgmtCmd: Management command: abort task (CmdSN 30763, ITT 0xb1790000) not found.
8/15 18:34:01.592 2e conf: *** ControlConnection::doControl: Plugin 'NVMfTarget' not found!
8/15 18:34:05.775 90 T[1c,b8d]: iScsiTask::handleTaskMgmtCmd: Management command: abort task (CmdSN 30763, ITT 0xb1790000) not found.
8/15 18:34:13.294 90 C[1c], LIN: iScsiConnection::receive: recvData returned 10058 (0x274a)!
8/15 18:34:13.295 90 C[1c], LIN: iScsiConnection::recvWorker: *** 'recv' thread: recv failed 10058.
8/15 18:34:15.802 8f Tgt: iScsiTarget::closeSession: close 'iqn.2008-08.com.starwindsoftware:starwindvsan.htz.netkern.local-teststorage1': 0 session(s) opened, 65536 more allowed.
8/15 18:34:15.802 8f S[1c]: iScsiSession::~iScsiSession: ~Session
8/15 18:34:16.039 9 Srv: iScsiServer::listenConnections: Accepted iSCSI connection from 172.16.66.10:17914 to 172.16.66.1:3260. (Id = 0x1d)
8/15 18:34:16.039 9 S[1d]: iScsiSession::iScsiSession: Session (0000000001CE7640)
8/15 18:34:16.039 9 C[1d], FREE: iScsiConnection::doTransition: Event - CONNECTED.
8/15 18:34:16.039 9 C[1d], XPT_UP: iScsiConnection::fsmT3: T3.
8/15 18:34:16.291 93 C[1d], XPT_UP: iScsiConnection::handleFirstLogin: Login request: ISID 0x00023d000001, TSIH 0x0000.
8/15 18:34:16.291 93 C[1d], XPT_UP: iScsiConnection::doTransition: Event - LOGIN.
8/15 18:34:16.291 93 C[1d], IN_LOGIN: iScsiConnection::fsmT4: T4.
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< String param 'InitiatorName': received 'iqn.1998-01.com.vmware:esxicloud01-7948ae11', accepted 'iqn.1998-01.com.vmware:esxicloud01-7948ae11'
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< String param 'TargetName': received 'iqn.2008-08.com.starwindsoftware:starwindvsan.htz.netkern.local-teststorage1', accepted 'iqn.2008-08.com.starwindsoftware:starwindvsan.htz.netkern.local-teststorage1'
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Enum param 'SessionType': received 'Normal', accepted 'Normal'
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Enum param 'HeaderDigest': received 'None', accepted 'None'
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Enum param 'DataDigest': received 'None', accepted 'None'
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Numeric param 'DefaultTime2Wait': received 2, accepted 2
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Numeric param 'DefaultTime2Retain': received 0, accepted 0
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Boolean param 'IFMarker': received No, accepted 0
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Boolean param 'OFMarker': received No, accepted 0
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Numeric param 'ErrorRecoveryLevel': received 0, accepted 0
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Boolean param 'InitialR2T': received No, accepted 0
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Boolean param 'ImmediateData': received Yes, accepted 1
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Numeric param 'MaxBurstLength': received 262144, accepted 262144
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Numeric param 'FirstBurstLength': received 262144, accepted 262144
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Numeric param 'MaxOutstandingR2T': received 1, accepted 1
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Numeric param 'MaxConnections': received 1, accepted 1
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Boolean param 'DataPDUInOrder': received Yes, accepted 1
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Boolean param 'DataSequenceInOrder': received Yes, accepted 1
8/15 18:34:16.291 93 Params: iScsiParameter::update: <<< Numeric param 'MaxRecvDataSegmentLength': received 131072, accepted 131072
8/15 18:34:16.293 93 PR: ResLunSessionEngine::registerSession: LUN 0: existing record for session 0x1d from iqn.1998-01.com.vmware:esxicloud01-7948ae11,00023D000001
8/15 18:34:16.293 93 PR: ResLunSession::setUnitAttention: Set UA 0x2901 (0x0) for session 0x1d from iqn.1998-01.com.vmware:esxicloud01-7948ae11,00023D000001.
8/15 18:34:16.293 93 Tgt: iScsiTarget::openSession: open 'iqn.2008-08.com.starwindsoftware:starwindvsan.htz.netkern.local-teststorage1': 1 session(s) opened, 65535 more allowed.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> ErrorRecoveryLevel=0.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> MaxConnections=1.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> HeaderDigest=None.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> DataDigest=None.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> TargetAlias=TestStorage1.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> OFMarker=No.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> IFMarker=No.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> InitialR2T=No.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> ImmediateData=Yes.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> MaxRecvDataSegmentLength=262144.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> MaxBurstLength=262144.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> FirstBurstLength=262144.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> DefaultTime2Wait=2.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> DefaultTime2Retain=0.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> MaxOutstandingR2T=1.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> DataPDUInOrder=Yes.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> DataSequenceInOrder=Yes.
8/15 18:34:16.293 93 Params: iScsiParams::createKeys: >>> TargetPortalGroupTag=1.
8/15 18:34:16.293 93 Srv: SwServerNode::bindThread: 0x92 (146) to group 0, affinity 0xf
8/15 18:34:16.293 93 S[1d]: iScsiSession::bindToServerNode: execThread handle = 0x0000000000000168, id = 146
8/15 18:34:16.293 93 Srv: SwServerNode::bindThread: 0x93 (147) to group 0, affinity 0xf
8/15 18:34:16.294 93 Srv: SwServerNode::bindThread: 0x91 (145) to group 0, affinity 0xf
8/15 18:34:16.294 92 T[1d,1]: iScsiTask::execLoginReq: session 0x1d, connection 0x1d : end of stage 1, next stage 3.
8/15 18:34:16.294 92 C[1d], IN_LOGIN: iScsiConnection::doTransition: Event - LOGIN_ACCEPT.
8/15 18:34:16.294 92 C[1d], LIN: iScsiConnection::fsmT5: T5.
8/15 18:34:17.258 92 PR: ResLunSessionEngine::returnUnitAttention: UA 0x2901 returned to opcode 0x28 for session 0x1d from iqn.1998-01.com.vmware:esxicloud01-7948ae11,00023D000001.
Unfortunately ESXi doesn't like that:

Code: Select all

2019-08-15T19:07:07.586Z esxi vobd:  [APDCorrelator] 478208243978us: [esx.problem.storage.apd.start] Device or filesystem with identifier [eui.47127b055d9d0380] has entered the All Paths Down state
Some core info:
  • VM has 1x 20 GB HDD disk and 1x 100 GB SSD disk
  • 1x Storage created with 15 GB, 128 MB RAM Cache and 15 GB SSD Cache
  • 1x VM created on said storage, which is mounted on ESXi
  • I'm using (and need to) a MTU of 1400. This is set in the vSwitch and within the Starwinds VSAN VM (Linux appliance)
So basically the issue looks like so every few minutes:
Image

Any ideas? I'm of course providing any additional logs if helpful.
Michael (staff)
Staff
Posts: 317
Joined: Thu Jul 21, 2016 10:16 am

Fri Aug 16, 2019 7:39 pm

Could you please clarify if you trying to connect the target from StarWind VM (deployed on ESXi host) on the same ESXi host?
Could you please double-check MTU settings on vmKernel port?
p_k
Posts: 3
Joined: Thu Aug 15, 2019 6:38 pm

Fri Aug 16, 2019 8:22 pm

Could you please clarify if you trying to connect the target from StarWind VM (deployed on ESXi host) on the same ESXi host?
Yes, correct. The Starwind VM runs on the same host from where I'm trying to mount the iSCSI target from.
Could you please double-check MTU settings on vmKernel port?
I have. The MTU is set correctly within the Starwinds VM, as well as on the VMkernel adapter.
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Mon Aug 19, 2019 11:56 am

Hi,
I'm using (and need to) a MTU of 1400. This is set in the vSwitch and within the Starwinds VSAN VM (Linux appliance)
Could you please try with a default MTU size of 1500? It could be a reason of "paths down" failures. ISCSI packets are dropping due to the maximum MTU size you made.
danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Mon Aug 19, 2019 6:19 pm

Also, you said you changed the MTU in the vswitch. I think you also need to change it in the vmkernel adapter? At least if you created the vswitch&vmk with default MTU but then changed the vswitch MTU later, the vmk will still have 1500.
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Tue Aug 20, 2019 8:53 am

Yes, MTU size should be the same on all levels, vSwitch, vmkernel and on the interfaces inside StarWind VMs.
p_k
Posts: 3
Joined: Thu Aug 15, 2019 6:38 pm

Tue Aug 20, 2019 6:29 pm

Oleg(staff) wrote:Hi,
I'm using (and need to) a MTU of 1400. This is set in the vSwitch and within the Starwinds VSAN VM (Linux appliance)
Could you please try with a default MTU size of 1500? It could be a reason of "paths down" failures. ISCSI packets are dropping due to the maximum MTU size you made.
While I can try using a MTU of 1500, I unfortunately depend on the lower MTU of 1400 bytes. The ISP where I've rented the servers at are offering a "virtual switch" (L2) across rented dedicated servers, which is unfortunately limited at 1400 bytes. Yes, it might be fine when using the Starwinds VSAN VM on the same host, but it won't help with my plan of spinning up a 2-node-setup on two dedicated servers using said switch.
danswartz wrote:Also, you said you changed the MTU in the vswitch. I think you also need to change it in the vmkernel adapter? At least if you created the vswitch&vmk with default MTU but then changed the vswitch MTU later, the vmk will still have 1500.
Yes. The MTU settings are correct:
Image
Image
Image
danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Tue Aug 20, 2019 6:34 pm

Hmmm, no idea then :(
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Wed Aug 21, 2019 8:40 am

Just to clarify, are you using the same vSwitch for Management and ISCSI traffic?
Post Reply