Page 1 of 1

HA Cluster - Clean Reboot of one node caused disconect

Posted: Fri Dec 20, 2013 12:37 am
by jtmroczek
Hello:

While performing maintenance on our HA Cluster, we rebooted one of the 2 nodes. Historically this has been handed gracefully. This time BOTH hosts sent logout messages to the initiators and it was 6 minutes! until the initiators could log back in again. This affected some (possibly all) HA LUNs in the cluster.

Where do I start looking for cause and how to prevent in the future?

Additional info:
The change necessitating the reboot was an upgrade to the driver for the RAID controller.
Starwind Host OS: Windows 2008 R2 SP1
Initiator OS: Seen from both Windows 2008 R2 SP1 ad Windows 2003 R2 SP2.
Starwind Version: v6.0.0 (Build 20120927, [SwSAN], Win64)

Thank you for any assistance you can provide to avoid this in the future.

~joe

Re: HA Cluster - Clean Reboot of one node caused disconect

Posted: Fri Dec 20, 2013 12:35 pm
by Anatoly (staff)
Thank you for using StarWind.

I`ve seen such issues before and as far as I understand the situation you need to update your SAN software to the latest build.
Also I`d like to ask you to ensure that you haven`t got any hardware errors on the StarWind box by reviewing the WinApp and WinSys logs.

Re: HA Cluster - Clean Reboot of one node caused disconect

Posted: Fri Dec 20, 2013 8:24 pm
by jtmroczek
Wow! I missed that we were on such an old version. The installers must have gotten mixed up. I think it is a testament to the quality of StarWind that the original 6.0.0 release has worked so well.

I have confirmed that no errors were reported in the Windows or IPMI event logs.

~joe

Re: HA Cluster - Clean Reboot of one node caused disconect

Posted: Tue Dec 24, 2013 2:12 pm
by Anatoly (staff)
OK. I think the best way here will be to update the SAN software and see if it helps.

Re: HA Cluster - Clean Reboot of one node caused disconect

Posted: Tue Dec 24, 2013 7:05 pm
by jtmroczek
Anatoly:

We have already performed the update. It is hard to know if the issue is resolved. Under the old code we rebooted over a dozen times without issue. I feel comfortable that there is nothing more to be done at this time.

~joe

Re: HA Cluster - Clean Reboot of one node caused disconect

Posted: Mon Dec 30, 2013 10:46 am
by Anatoly (staff)
Great to know!

Let us know if you`ll have any updates!