ISCSI Connection lost

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
Sebastian_Tappe
Posts: 5
Joined: Mon Jan 23, 2012 12:28 pm

Tue Mar 27, 2012 10:57 am

Hi,

after update to 5.8, we are getting this in our eventlog all the time (translated...;) ):

- The target did not respond in time to a task management request.
- The target has not responded to a SCSI request in time. CDB information contained in the backup data.
- Target or LUN can not be reset. It attempts to reestablish the session
- A connection to the target was interrupted, however, the initiator could reconnect with the target. The target name is included in the backup data

The HA-Storage is then sometimes out-of-sync and it performs a resync.
Sometimes the starwind management console isn't accessible, so we have to restart the service (resulting in a fullsync).
This happened 4 times in the last 7 days...
Once we got a bluescreen on both starwinds (Win 2008 R2).

The result over all is, that our virtual Linuxservers stops working. And after a reboot they are coming up with filesystem errors.

Hardware and drivers:
Sync NIC: Intel(R) 10 Gigabit AT2 Server Adapter, Driver: 2.4.29.1 JumboFrames enabled
storage NIC(teamed): Intel(R) Gigabit ET Dual Port Server Adapter, Driver:11.7.32.0
Windows ISCSI Initiator Driver: 6.1.7600.16385
StarportStorageController: 5.5.1.860

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Disk TimeOutValue already set to 60 sec.
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Tue Mar 27, 2012 1:01 pm

Hello Sebastian,
Please tell me if you're using StarPort as an iSCSI initiator?
Also, what is the network configuration between the SANs and Client<->SAN links?
Max Kolomyeytsev
StarWind Software
Sebastian_Tappe
Posts: 5
Joined: Mon Jan 23, 2012 12:28 pm

Tue Mar 27, 2012 2:17 pm

Hi Max,
We are using the Windows ISCSI initiator(default by starwind installation by 5.6).
There is a seperate direct line for starwind synchronisation (10Gb).
Client storage is also a seperate line (2x1000MBit Intel), this line is also used for heartbeat.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Wed Mar 28, 2012 10:14 am

First of all, what StarWind build are you using? You can always check the latest build bumber by using the link below:
http://www.starwindsoftware.com/forums/ ... tml#p14920
Sebastian_Tappe wrote:- The target did not respond in time to a task management request.
- The target has not responded to a SCSI request in time. CDB information contained in the backup data.
Would you kindly clarify this? HOw have you figured out shis? Thank you
Sebastian_Tappe wrote:- A connection to the target was interrupted, however, the initiator could reconnect with the target. The target name is included in the backup data
I think that you have deleted the device, but not the target that was linked with this device.
Sebastian_Tappe wrote:The HA-Storage is then sometimes out-of-sync and it performs a resync.
For me it looks like some Sync Channel data link issue. What is the NICs vendor? Have you updated the driver? Also, I would recommend you to setup using few NICs for Sync channel from through our target wizard instead of using Teaming.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Sebastian_Tappe
Posts: 5
Joined: Mon Jan 23, 2012 12:28 pm

Wed Mar 28, 2012 12:22 pm

Hi !

Our current Version "StarWind iSCSI SAN Software v5.8.0 (Build 20120124, [SwSAN], Win64)"
AKA 5.8.1889 (obviously not the most recent one). Do we need an Full - sync after a minor release
upgrade ?

Regarding
"The target did not respond in time to a task management request."
"The target has not responded to a SCSI request in time. CDB information contained in the backup data."
this are messages from the Windows Eventlog (Windows ISCSI) - translated manually from german to english.

Interesting thing is that we have no indication of an error within Starwind server's Log. But in the crash situation
it stopped working all of a sudden. After rebooting both machines we marked one Node as Synced and resynced the other.

So what do you think about replacing the M$ iscsi initiator with the starwind equivalent ?

We use only Supermicro Mainboards and INTEL NICs.
Our problem is between both HA nodes. They are connected by Intel 10GBe copper (Intel AT2 Server adapter)
directly without switch or anything else. Driver Version 2.4.29.1 - - Jumbo Packets enabled.
Will replace the Network driver with the most recent one immediately.

In parallel we did some performance measurements with NTTTCP - we currently only have 67% efficiency on the
10 Gbit - line ( PCI 8x ). We are replacing the cable currently.

So I suggest the following plan (please answer the included questions)

- Switch to most recent Starwind Build of 5.8 (Do we need a full sync ?)
- Replace Networking drivers with recent ones
- Replace Microsoft Iscsi Initiator (Should we do that ? How to do that)
- Fix Network efficiency issue (we did not see any No link - loss messages in Event log)

Sebastian
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Thu Mar 29, 2012 1:08 pm

Sebastian_Tappe wrote:Our current Version "StarWind iSCSI SAN Software v5.8.0 (Build 20120124, [SwSAN], Win64)" AKA 5.8.1889 (obviously not the most recent one). Do we need an Full - sync after a minor release upgrade ?
No, the Fast sync should be initiated automatically after unstalling update on one of the nodes.
Sebastian_Tappe wrote:Regarding
"The target did not respond in time to a task management request."
"The target has not responded to a SCSI request in time. CDB information contained in the backup data."
this are messages from the Windows Eventlog (Windows ISCSI) - translated manually from german to english.
99% that this errors are related to HA device unsyncking.
Sebastian_Tappe wrote:We use only Supermicro Mainboards and INTEL NICs.
Our problem is between both HA nodes. They are connected by Intel 10GBe copper (Intel AT2 Server adapter)
directly without switch or anything else. Driver Version 2.4.29.1 - - Jumbo Packets enabled.
Will replace the Network driver with the most recent one immediately.

In parallel we did some performance measurements with NTTTCP - we currently only have 67% efficiency on the
10 Gbit - line ( PCI 8x ). We are replacing the cable currently.
Well, I think you need to repeat the tests after you will update and replace everything that you wanted. Keep us updated please.
Sebastian_Tappe wrote:So I suggest the following plan (please answer the included questions)

1 Switch to most recent Starwind Build of 5.8 (Do we need a full sync ?)
2 Replace Networking drivers with recent ones
3 Replace Microsoft Iscsi Initiator (Should we do that ? How to do that)
4 Fix Network efficiency issue (we did not see any No link - loss messages in Event log)
1No need in full sync, as I`ve mentioned above.
2Where exactly do you want to replece MS initiator?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Sebastian_Tappe
Posts: 5
Joined: Mon Jan 23, 2012 12:28 pm

Thu Mar 29, 2012 1:55 pm

On our productive starwind-nodes we are running 5.8, updated from 5.6.
On these machines ms-iscsi-service is running and populated with the partner iscsi-targets.
I have built up a test environment with 5.8: no ms-iscsi-service running.
So the initiator must be in starwind itself.
Maybe the ms-initiator isn't working with an updated version of starwind?
Last edited by Sebastian_Tappe on Thu Mar 29, 2012 2:34 pm, edited 1 time in total.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Thu Mar 29, 2012 2:32 pm

Could you please clarify one thing. Open the starwind.cfg file and tell me what do you see in string

Code: Select all

<transport value="%parameter%"/>
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Sebastian_Tappe
Posts: 5
Joined: Mon Jan 23, 2012 12:28 pm

Thu Mar 29, 2012 2:32 pm

Hi,

there is no such string....
But on my testing machine i found this: <transport value="auto"/>
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Mar 30, 2012 10:01 am

This string means that StarWind is using its own transport for synchronization. I think it is better to use it for synchronization and use MS initiator to connect to targets with client machines.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply