Hyper-V Failover Cluster MPIO Issue

Branin · Wed Jun 10, 2015 4:17 pm

Me as well. I'm about out of ideas and may just go to Hyper-V Replicas (which could result in a minute of data loss, but at least work...). Still trying a few more things though to try to work around the Starwind issue. If any of them work, I'll write it up here.

bubu · Mon Jun 15, 2015 1:41 pm

I hope you agree with me that it does not look like StarWind related issue. And I understand that MS support might not be helpful since StarWind is installed. MS and StarWind might point each other (sorry guys

), but as far as I know StarWind guys troubleshoot even indirect issues that are not related to their software during the installation assistance wich they offer after purchase. Basically, they configure everything from scratch and leave a customer happy.

If I were you, I would not trade High Availability for Hyper-V Replicas.

It's up to you

Branin · Tue Jun 16, 2015 10:51 pm

bubu wrote:I hope you agree with me that it does not look like StarWind related issue. And I understand that MS support might not be helpful since StarWind is installed. MS and StarWind might point each other (sorry guys ), but as far as I know StarWind guys troubleshoot even indirect issues that are not related to their software during the installation assistance wich they offer after purchase. Basically, they configure everything from scratch and leave a customer happy.

If I were you, I would not trade High Availability for Hyper-V Replicas.

It's up to you

I agree that the problem doesn't look like a StarWind specific issue, but rather some interaction between the Virtual SAN Software, Windows itself and my hardware. Unfortunately, I'm only using the free version, so I don't get access to the StarWind auto-configuration team (as great as that would be). I'm going to continue trying different things to try to resolve the issue and if I ever do, I'll post back here.

Thanks.

Branin

Tourwinner · Wed Jun 17, 2015 2:45 am

Branin-

Earlier in the thread, it was mentioned that you connect the opposite witness to pass the failover cluster validation. If you connected those targets (connected to witness2 from node1 and witness1 from node2), make sure that they are disconnected from the Targets tab and also removed from the Favorites tab on the opposite nodes after the validation test (i.e. witness2 is disconnected and removed from the favorites tab on node1 etc.)

Also, make sure that MPIO for each target is set to Failover Only and the Active connection is set to 127.0.0.1 and the other connections are set to Standby.

We had this identical problem when we first set it up in a three node cluster. Once we worked with support, they showed me the error of our ways and we fixed it to great success. We now have four clusters successfully deployed and tested fully (plug pulling, planned failover, cluster aware updating etc.) It just works now (thank God....)

If it is setup like I described above, then the problem is something else I can't readily identify.

Good luck,
Jim

Wed Jun 17, 2015 12:41 pm

Branin,

May I wonder if there are any updates than those described here https://forums.starwindsoftware.com/vie ... f=5&t=4274 ??

Branin · Wed Jun 17, 2015 10:55 pm

Vitaliq (staff) wrote:Branin,

May I wonder if there are any updates than those described here https://forums.starwindsoftware.com/vie ... f=5&t=4274 ??

I believe I was able to "fix" this particular issue with the previous build (i.e. before 8116 and the L2 cache functionality), through 1 of 2 different mechanisms (or possibly both).

(Background: I have 2 4-port 1Gpbs network cards and 2 10Gpbs ports on the server motherboards, but only have 1Gbps switches, so I'm using the 2 10Gbps ports as direct connections between the two servers, utilizing Windows 2012 R2 NIC Teaming and setting up some virtual adapters for Sync, iSCSI, Live Migration, Cluster/CSV traffic on the 10Gpbs team.)

1) I originally followed your 2-node Hyper-V document exactly (i.e. no teaming, individual VLANs on the adapters, etc...), but may have run into a problem with the heartbeat network going through my 1Gbps switches instead of the 10Gbps adapters directly. I have StarWind Sync/Heartbeat on one of the virtual adapters and a Heartbeat on another adapter (my Cluster/CSV adapter) and the failover now happens quicker.

2) I've also set my Cluster network in Failover Clustering to support "Allow clients to connect through this network" (part of "Allow cluster network communication on this network").

When I put both CSVs and the Witness on the same physical host and then pull the power cords on that host, the Witness fails over to the "live" host almost immediately. The CSVs take several seconds, but both failed over eventually (at least, they did so 10 times in a row) with my changes above. Before I did the changes above, typically one of the CSVs would fail over (again, after several seconds), but the other one would time out a few seconds later. (Not always though; sometimes both would fail over in several seconds and sometimes neither would fail over and both would time out. However, about 80%-90% of the time, one CSV would failover and the other one wouldn't).

My problem in the other thread has started since I've installed build 8116. I have rebuilt both of the servers from scratch (down to and including recreating the RAID arrays) at least 25 times so far in an attempt to keep each try as clean as possible.

Thank you all!!!

Branin

Branin · Wed Jun 17, 2015 10:58 pm

Tourwinner wrote:Branin-

Earlier in the thread, it was mentioned that you connect the opposite witness to pass the failover cluster validation. If you connected those targets (connected to witness2 from node1 and witness1 from node2), make sure that they are disconnected from the Targets tab and also removed from the Favorites tab on the opposite nodes after the validation test (i.e. witness2 is disconnected and removed from the favorites tab on node1 etc.)

Also, make sure that MPIO for each target is set to Failover Only and the Active connection is set to 127.0.0.1 and the other connections are set to Standby.

We had this identical problem when we first set it up in a three node cluster. Once we worked with support, they showed me the error of our ways and we fixed it to great success. We now have four clusters successfully deployed and tested fully (plug pulling, planned failover, cluster aware updating etc.) It just works now (thank God....)

If it is setup like I described above, then the problem is something else I can't readily identify.

Good luck,
Jim

Jim,

Thanks for the help. Unfortunately, I've stopped connecting the Witness to the "other" node for at least the last 15 server builds I've attempted and have confirmed that MPIO is set up as described. Planned failover and cluster aware updating worked great! Is was just plug pulling (or forcing a BSOD) that had issues. When you pulled the plug, how long did it take for your CSVs to failover to one of your other nodes?

Thanks.

Branin

Tourwinner · Sun Jun 21, 2015 2:39 pm

Branin-

Sorry for the delay.

It was pretty quick. The delay is less than 2 seconds and the only things to go down are the VMs on the host that was yanked. It works as designed.

The heartbeat stuff is pretty important. On our three node clusters, we have the HB that VSAN provides, but we also use teamed management NICs on each host to provide heartbeat for failover cluster manager. That team connects one NIC to different switches in a stack. Therefore, we can lose a NIC or a switch and still be running.

In failover cluster manager, make sure that the networks do not send any cluster traffic over the iSCSI or the SYNC channels. If you have a separate HB network that's great, but you can also send it over the management NIC and have it set to pass cluster and client traffic. Again, having stacked switches with teamed NICs there adds a nice layer of fault tolerance.

If I get a chance, I can upload some screen shots of your setup for you to review. I'm currently on a fishing trip and have other things to do......

Thanks,
Jim

Tue Jun 23, 2015 4:58 pm

Branin,

I am glad that the issue is "resolved"

It still looks very weird. I will bear this in mind.

Branin · Mon Jul 20, 2015 9:45 pm

I spoke too soon. :- ( I've still had the problem and have spent the last month trying to track down the issue. I've finally figured it out (sort of).

Essentially, there are 2 issues (using Failover Only as the MPIO policy):

1) The MPIO paths aren't sticking around between boots. For example, the CSV is connected on paths 1 and 2 on one of my nodes, and connected on paths 5 and 6 on my other node (just as an example). After a few reboots, I've found that the path numbers have totally switched. Meaning, the first node now connects to the CSV via paths 5 and 6 and the second node connects via paths 1 and 2. I've set a "preferred" path in the MPIO tab in Disk Management, but the preferred path box becomes unchecked when the paths randomly switch around like this.

2) Another problem is that if I set the local connection path (127.0.0.1) as my preferred active/optimized MPIO path, and my remote connection path as Standby (or active/unoptimized), then pull the power cord on the server that "owns" the CSV (per Failover Cluster Manager), the CSV doesn't automatically failover to the surviving node (it goes to "online pending" and then "failed". I can manually bring it online at that point). However, if I set the remote connection as the active/optimized MPIO path and the local connection as the standby (or active/unoptimized) path, then run the same test (pulling the power cord on the server that "owns" the CSV), the CSV successfully failsover (online pending for about a minute, then "Online"). Obviously, having all the traffic constantly go to the "remote" server negates the benefits of a local connection in the first place.

For 2 above, if I change the policy to Round Robin, the failover happens correctly. I've also tried "weighted paths" with an extremely high weight on the remote connection, but this doesn't work.

In short: MPIO Path Ids seem to not remain consistent (making it very hard to set "preferred" paths). Also, connecting to the local connection as the Active connection doesn't work for failovers, but connecting to the remote connection (or using Round Robin) do work.

Any ideas if I'm doing something wrong, if there is a Microsoft bug, or a Starwind bug? (I've opened a ticket with Microsoft for this issue, but so far they are having trouble even confirming that using Starwind Virtual SAN in a hyper-converged form at all is a supported architecture.)

Thanks!

Branin

darklight · Tue Jul 28, 2015 4:56 pm

Hi, Branin

This MPIO mess sounds very familiar to me. I was also playing with these settings a while ago. In the end, I have just left everything as Round Robin (however StarWind officially recommends using failover only policy in hyper-converged scenarios) and am fine with it so far. Haven't done any further investigations since it seems a Microsoft issue to me and i do not have enough wish to deal with nightmare MS support

Tarass (Staff) · Tue Aug 11, 2015 10:46 am

Hi all, thanks for the investigation and reporting.

These problems are reported to R&D team and will be reviewed/fixed in the upcoming builds.

Kayakero · Fri Mar 17, 2017 4:40 pm

Similar problem here. Hyperconverged 2 nodes FCI SQL cluster. Windows Server 2016, SQL 2016.
Using latest version 8.10695

I've configured all MPIO paths correctly as per Star Wind Best Practices, Fail Over Only, 127.0.0.1 Active , other iSCSI subnet standby.
After a grecefully node reboot, al MPIO paths are moved around, and now I have still Fail Over Only but my Active Path IS NOT loopback anymore, now my active path is thru real iSCSI network and loopback is Standby.
Why is this happening and what problems may bring ? performance ? not using improved loopback optimization anymore ?

Do I have to switch it to round robin despite what best practices say ?
What's Starwind position on this?
I don't think windows just change preferred MPIO paths just because it wants.

Doing some more researching it seems it's a known problem...
https://forums.starwindsoftware.com/vie ... f=5&t=4385

So do we have to do some manual script to fix the loopback preferred path after reboot ?
There's some official example to borrow from ?

Later I'll test how much performance is affected between Fail Over (active loopback) vs Fail Over (active remote) vs Round Robin (looback + remote)

Thu Mar 30, 2017 6:39 pm

Hello Kayakero,
The Failover only MPIO policy recommended to use only with 1 GbE iSCSI channel. With 10 GbE iSCSI channel we recommend to use Round Robin.
Since loopback connection much faster than 1Gbps partner connection it makes sense to set partner connection as "standby" with failover only MPIO policy for better performance.
Unfortunately, at this point, we do not have a script to automate the "switch to loopback path" process.
I think the best way is to check the connections manually every time after restarting the server.

Kayakero · Thu Mar 30, 2017 7:28 pm

manually?? Seriously?