V5 HA Failover and MSCS

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
marcello
Posts: 15
Joined: Wed Feb 03, 2010 6:40 pm

Thu Feb 04, 2010 7:55 am

I have two HP Servers (ML370) with SCSI Storages (MS500G2) attached acting as iSCSI Targets.
Starwind V5.5 (Beta) is installed on both servers.
Two HP servers (DL360) with MS Windows 2003 Ent. R2 (incl SP2) have MS iSCSI Initiator (last release) installed and are configured as MS Cluster
All servers have redundant iSCSI-paths and the sync channel is also fault tolerant by HP Nic Teaming software.
The HA images are installed as clean images (by the way this lasts 55 hours for an image with a size of 1.8TB) with Autosync enabled.
Every Node in the cluster has 4 connections (2 to the primary and 2 to the secondary target) configured as Round-Robin, so all links are active

Now my problem: If I reboot the primary target server the cluster service will fail because it looses the connection to the quorum for a short time (this happens to all disk ressources in the cluster but only for the Quorum it is fatal)
After one minute (this is by MS design) the cluster service starts successfully.
Is there any way to prevent the cluster service from going down because of loosing connection to the Quorum.
I tried to change some registy settings like TimeOutValue under Disk ore the LinkDownTime in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\xxxx\Parameters
but nothing helps.
What else can I do ?

wbr
Constantin (staff)

Thu Feb 04, 2010 9:57 am

This is a problem of MSCS. Check this theme https://social.technet.microsoft.com/Fo ... Clustering on MS TechNet for fixes.
marcello
Posts: 15
Joined: Wed Feb 03, 2010 6:40 pm

Thu Feb 04, 2010 10:15 am

I know it is a problem of W2003 becaus the Quorum model is single point of failure.
I searched the last days on www to find some hints to solve this problem but did not find any solution.
The problem in that case is that the cluster becomes aware of the MPIO path changes from primary to secondary Server and I search for an value which forces the cluster to wait a little bit more for Quorum.
Server 2008 does not have this problem because MS impleneted other Quorum models (ie Node and Disk Majority).

I hoped not to be the only one with this problem.
Constantin (staff)

Thu Feb 04, 2010 3:16 pm

If you`ll find a workaround, please inform us about it.
Post Reply