We've managed to get our VSAN CVM (Hyper-V) setup working really well so far and are super pleased with the results!
So far our issues have come down to hardware faults or our own misconfiguration.
One of the final remaining issues we have is that our windows iSCSI initiators often return 'Target Error' when trying to make a new connection/session to VSAN.
I've read several times that the Windows iSCSI initiator implementation is flaky so this may just be par for the course, but I thought I'd check....
We have two VSAN nodes, and two data/target networks, thus we have 4x paths to each HA target from our Windows initiator nodes.
After several attempts and sometimes after leaving a given windows initiator node for a few hours, we can get at least one connection/session to each VSAN node for a given target, but getting a session on each of the 4x paths per target can take a lot of attempts (luckily we're using powershell).
Yaroslav previously helpfully advised we should use multiple sessions/connections to improve performance for SSD/NVMe (we have both) but I'm not sure given we have 4x paths how many sessions we should have per path (2x would mean 8x per target and so on) - if you could clarify Yaroslav (or someone else) please that would be fantastic.
Here is an example of a powershell command which resulted in 'Target Error':
Code: Select all
Connect-IscsiTarget -NodeAddress 'iqn.2008-08.com.starwindsoftware:10.X.X.X-03-04-csv-02-02' -TargetPortalAddress 'X.X.3.173' -InitiatorPortalAddress 'X.X.3.106' -InitiatorInstanceName 'ROOT\iScsiPrt\0000_0' -IsMultipathEnabled $true -IsPersistent $true
Code: Select all
9/19 5:37:12.452106 9 Srv: iScsiServer::listenConnections: Accepted iSCSI connection from X.X.3.106:59753 to X.X.3.173:3260. (Id = 0x10b8)
9/19 5:37:12.452218 9 S[10b8]: iScsiSession::iScsiSession: Session (00007F36791C4C80)
9/19 5:37:12.452235 9 C[10b8], FREE: iScsiConnection::doTransition: Event - CONNECTED.
9/19 5:37:17.584373 a4 Srv: *** SwSocket::Recv: Swn_SocketRecv() failed with error 10035 (0x2733)!
9/19 5:37:17.584532 a4 C[10b8], XPT_UP: iScsiConnection::recvData: Recv returned 10035 (0x2733)!
9/19 5:37:17.584641 a4 C[10b8], XPT_UP: iScsiConnection::receive: recvData returned error 10035 (0x2733)!
9/19 5:37:17.584684 a4 C[10b8], XPT_UP: iScsiConnection::recvWorker: *** 'recv' thread: recv failed 10058.
9/19 5:37:17.685115 12c S[10b8]: iScsiSession::~iScsiSession: ~Session
I also read in the forum that setting "TcpKeepAlivePeriod" to "20" could help with Windows initiator but that didn't work either (and the forum post noted that it was fixed in a VSAN version some time ago).
One complication here may be that our target/data networks are infiniband (IPoIB) though everything is very very fast once the initiator connections are established so I'm not sure that's the problem.
We are running "StarWind Virtual SAN (VSAN) v8.0.0 (Build 15469)" - there's a newer version I know, but we've just got it all running sweet (apart from this issue) so I've been reluncant to update just yet! However I can do that this weekend if you think that could fix it.
Any hints or tips much appreciated!
Many thanks
James