iscsi reconnecting repeatedly

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
gstephenson
Posts: 31
Joined: Tue Feb 22, 2011 10:09 pm

Tue Aug 16, 2011 1:40 pm

Hi...

Max is looking into this problem with me but we seem to be at a standstill at the moment, so I wondered if anyone else might be able to comment or make suggestions.

I have a two-node hyper-v cluster on win2k8R2-SP1, Starwind also on the same two boxes.

I tried to upgrade server A from 5.6.1690 to 5.7.1721 but it hung so we had to kill the install. Max then directed me to install 5.7.1727 which fixes issues with two-node hyper-v clusters. It installed ok on both server A and B.

The problem is that the iscsi connections are repeatedly reconnecting on both servers.

Windows Event log shows event 20: "Connection to the target was lost. The initiator will attempt to retry the connection." followed by event 34: "A connection to the target was lost, but Initiator successfully reconnected to the target. Dump data contains the target name." And it is doing this for all 3 of my targets. The Windows Event log is filled with literally thousands of these.

Also getting a smaller number of other related errors (some of which I know are just consequence of the two above):

- Connection to the target was lost. The initiator will attempt to retry the connection.
- Target failed to respond in time to a Task Management request.
- Target sent an invalid iSCSI PDU. Dump data contains the entire iSCSI header.
- Initiator sent a task management command to reset the target. The target name is given in the dump data.
- The initiator could not send an iSCSI PDU. Error status is given in the dump data.
- Target failed to respond in time for a login request.

And one other detail which I am not sure of the impact is that on server A, the MPIO panel only shows one device whereas server B has two. The missing one is "Vendor 8Product 16". It was there when 5.6 was running. The strange part is that Max checked the connections and it appears that it is actually using MPIO for the partner targets.

I sent Starwind and Windows logs to Max for R&D to look at. No word back yet.

This is really holding up my project work and my VMs are at risk like this. Any advice would be appreciated. I thought about going back to 5.6 - is that even possible? My gut feeling is that something is corrupted somewhere (maybe even in Windows itself?) and I'm not sure if going back to 5.6 would solve it.

Graham
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Aug 16, 2011 9:41 pm

This setup (Hyper-V two node only) is known to have some issues which are MS iSCSI initiator related. Support techies should let you know some configuration tricks (ping settings) to keep both nodes from re-connnecting. We'd fix this completely in next V5.7 build and also get rid from MS iSCSI initiator completely and roll back to own interconnect transport in V5.8 as MS iSCSI is slow and unreliable.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
gstephenson
Posts: 31
Joined: Tue Feb 22, 2011 10:09 pm

Thu Aug 18, 2011 4:17 pm

For anyone reading, the solution was to set iscsiPingPeriod to 20 in Starwind.cfg, restart Starwind, let sync finish, then do the same on the second server.

Thanks Max and Anton!
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Aug 18, 2011 9:33 pm

Thank you for confirming this working. We'll provide this as core fix for upcoming V5.7 re-published build.
gstephenson wrote:For anyone reading, the solution was to set iscsiPingPeriod to 20 in Starwind.cfg, restart Starwind, let sync finish, then do the same on the second server.

Thanks Max and Anton!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
gstephenson
Posts: 31
Joined: Tue Feb 22, 2011 10:09 pm

Tue Sep 06, 2011 12:48 pm

This is happening to me again now, except it only happens about 3 times in 24 hours and only happens on one of my machines (out of 4 total). I am on v5.7.1727 and have the iscsiPingPeriod value set to 20. This started happening after I applied Windows Updates (to all of my machines) last week.

Any ideas?
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Tue Sep 06, 2011 2:37 pm

Update your build: the latest one is 5.7.1733. You can download it using this link.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
gstephenson
Posts: 31
Joined: Tue Feb 22, 2011 10:09 pm

Tue Sep 06, 2011 3:35 pm

Anatoly (staff) wrote:Update your build: the latest one is 5.7.1733. You can download it using this link.
Ok done. Now will wait 24 hours to see if the errors stop.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Sep 07, 2011 9:57 am

Please keep us updated. Thank you!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jeffhamm
Posts: 47
Joined: Mon Jan 03, 2011 6:43 pm

Wed Sep 07, 2011 1:49 pm

You mention that this only affects a 2 Node Hyper-V cluster. I assume this does not affect a 3 Node Hyper-V cluster? Is this because the issue lies with the connection to the Quorum drive? Is it an issue with the Quorum type?

Thanks,
Jeff
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Sep 07, 2011 4:13 pm

There's no version with 3 node cluster supported yet. At least not for public experiments.
jeffhamm wrote:You mention that this only affects a 2 Node Hyper-V cluster. I assume this does not affect a 3 Node Hyper-V cluster? Is this because the issue lies with the connection to the Quorum drive? Is it an issue with the Quorum type?

Thanks,
Jeff
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jeffhamm
Posts: 47
Joined: Mon Jan 03, 2011 6:43 pm

Wed Sep 07, 2011 5:07 pm

So if I have a StarWind HA Setup (2 physical servers), I can only have up to two Hyper-V servers as members of a Hyper-V cluster (2 more physical servers) connect to StarWind at a time? Was not aware of that limitation. Currently running 2 Hyper-V servers in a MS cluster connected to the StarWind HA nodes, but was planning on adding a 3rd Hyper-V server to the MS cluster at some point in the future.

Thanks,
Jeff
gstephenson
Posts: 31
Joined: Tue Feb 22, 2011 10:09 pm

Wed Sep 07, 2011 6:15 pm

jeffhamm wrote:So if I have a StarWind HA Setup (2 physical servers), I can only have up to two Hyper-V servers as members of a Hyper-V cluster (2 more physical servers) connect to StarWind at a time? Was not aware of that limitation. Currently running 2 Hyper-V servers in a MS cluster connected to the StarWind HA nodes, but was planning on adding a 3rd Hyper-V server to the MS cluster at some point in the future.

Thanks,
Jeff
I believe he is referring to just a simple two-node setup where both Starwind and Hyper-v are installed on each box and using crossover cables. In your case you can add more hyper-v servers.
gstephenson
Posts: 31
Joined: Tue Feb 22, 2011 10:09 pm

Wed Sep 07, 2011 6:15 pm

anton (staff) wrote:Please keep us updated. Thank you!
No errors after 24 hours, so looks like v5.7.1733 solved it.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Sep 07, 2011 8:24 pm

No, I've mean totally different thing. At this moment you can have as many Hyper-V hosts connected to a pair of HA StarWind nodes as you want. What you cannot do however is you cannot have more then two StarWind nodes synchronized with each other. Upcoming versions will have ability to triplicate (have 3 StarWind nodes) and have +1 extra copy replicated in async way. Making so-called 2+1 and 3+1 scenarios possible.
jeffhamm wrote:So if I have a StarWind HA Setup (2 physical servers), I can only have up to two Hyper-V servers as members of a Hyper-V cluster (2 more physical servers) connect to StarWind at a time? Was not aware of that limitation. Currently running 2 Hyper-V servers in a MS cluster connected to the StarWind HA nodes, but was planning on adding a 3rd Hyper-V server to the MS cluster at some point in the future.

Thanks,
Jeff
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply