HA Network Failover Problem

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

rchisholm
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Fri Sep 23, 2011 4:35 pm

Vsphere for our network.
jeffhamm
Posts: 47
Joined: Mon Jan 03, 2011 6:43 pm

Sat Sep 24, 2011 2:20 am

Our setup is similar to hixont - Hyper-V R2 Cluster / SCVMM2008 R2 / DPM 2010
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Sep 26, 2011 10:58 am

In such a case you should really look at our new StarWind Native SAN for Hyper-V product.
hixont wrote:
anton (staff) wrote:1) We'll do support all of the listed scenarios. So you're not going to be forced to pick up working way.
Thanks.
anton (staff) wrote:2) What hypervisor do you run at this moment?
I am a Hyper-V (Windows 2008 R2) shop and bounce between Hyper-V Manager, Failover Cluster Manager and SCVMM 2008 R2 as my management consoles.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Sep 26, 2011 10:59 am

Bad. Legacy mode (dedicated hardware + Windows + StarWind) only.
rchisholm wrote:Vsphere for our network.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Sep 26, 2011 10:59 am

...and you should also take a look at Native SAN from us :)
jeffhamm wrote:Our setup is similar to hixont - Hyper-V R2 Cluster / SCVMM2008 R2 / DPM 2010
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
rchisholm
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Mon Sep 26, 2011 5:07 pm

I'd prefer to put it on physical hardware anyway. Just prefer for it not be a fully built storage node. I could easily throw it on a dedicated blade.
anton (staff) wrote:Bad. Legacy mode (dedicated hardware + Windows + StarWind) only.
rchisholm wrote:Vsphere for our network.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Sep 27, 2011 11:35 am

StarWind iSCSI SAN for dedicated hardware and StarWind Native SAN for Hyper-V is for keeping storage on the same box. There's no "one size fits all" here :)
rchisholm wrote:I'd prefer to put it on physical hardware anyway. Just prefer for it not be a fully built storage node. I could easily throw it on a dedicated blade.
anton (staff) wrote:Bad. Legacy mode (dedicated hardware + Windows + StarWind) only.
rchisholm wrote:Vsphere for our network.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jeffhamm
Posts: 47
Joined: Mon Jan 03, 2011 6:43 pm

Tue Sep 27, 2011 5:30 pm

Anton,

I've been doing some more testing, and I'm getting the results below, which I believe are what you said should be expected:

- Pull power cord on Node1 (priority = primary on all LUNs). Result is auto failover to Node2 with no downtime

- Pull power cord on Node2 (priority = secondary on all LUNs). Result is auto failover to Node1 with no downtime


But I have a question about the original scenario from when I started this thread, and how to recover from "split brain":

- Pull all network cables on Node1 (priority = primary on all LUNs). Result is "split brain" and both nodes go off line.

But let's assume that I am never able to get Node1 back online for some reason. How can I force the targets on Node2 to come back online? And then after I am able to replace the hardware on Node1, how do I bring it back online later without causing data corruption?

thanks!
Jeff
hixont
Posts: 25
Joined: Fri Jun 25, 2010 9:12 pm

Tue Sep 27, 2011 9:48 pm

Why fix what's not broken? I already have a working StarWind SAN/Hyper-V configuration that I am very happy with. Four Hyper-V hosts (more to come) and two dedicated storage servers running StarWind in a HA configuration. I really prefer to avoid an "all my eggs in one basket" scenario. Right now my single points of failure are few so short of losing a particular production switch (single point of failure network) or the entire data center (no alternate or offsite) I can sleep at night. I'm working on removing those last two SPoF from my ToDo list.
anton (staff) wrote:In such a case you should really look at our new StarWind Native SAN for Hyper-V product.
hixont wrote:
anton (staff) wrote:2) What hypervisor do you run at this moment?
I am a Hyper-V (Windows 2008 R2) shop and bounce between Hyper-V Manager, Failover Cluster Manager and SCVMM 2008 R2 as my management consoles.
Last edited by hixont on Wed Sep 28, 2011 4:15 pm, edited 1 time in total.
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Wed Sep 28, 2011 1:27 am

We give you a weapon of choice here, you can either go for traditional "separate SAN" infrastructure or build everything within 2 boxes. Both ways have their cons and pros.
Max Kolomyeytsev
StarWind Software
jeffhamm
Posts: 47
Joined: Mon Jan 03, 2011 6:43 pm

Thu Sep 29, 2011 7:54 pm

Anton - Have you had an opportunity to review my latest post regarding the "split brain" issue?

Thanks!
Jeff
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Sep 30, 2011 9:57 am

Good. It works absolutely as it's expected to work.

You can start any node on-line and synchronize ex-slave with a new master in slave -> master direction.

You're not going to have any data corruption.
jeffhamm wrote:Anton,

I've been doing some more testing, and I'm getting the results below, which I believe are what you said should be expected:

- Pull power cord on Node1 (priority = primary on all LUNs). Result is auto failover to Node2 with no downtime

- Pull power cord on Node2 (priority = secondary on all LUNs). Result is auto failover to Node1 with no downtime


But I have a question about the original scenario from when I started this thread, and how to recover from "split brain":

- Pull all network cables on Node1 (priority = primary on all LUNs). Result is "split brain" and both nodes go off line.

But let's assume that I am never able to get Node1 back online for some reason. How can I force the targets on Node2 to come back online? And then after I am able to replace the hardware on Node1, how do I bring it back online later without causing data corruption?

thanks!
Jeff
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Sep 30, 2011 9:58 am

Of course I've lost it in a many threads :( Sorry for delay, had just replied.
jeffhamm wrote:Anton - Have you had an opportunity to review my latest post regarding the "split brain" issue?

Thanks!
Jeff
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jeffhamm
Posts: 47
Joined: Mon Jan 03, 2011 6:43 pm

Fri Sep 30, 2011 1:15 pm

"You can start any node on-line and synchronize ex-slave with a new master in slave -> master direction. "

How do I bring Node2 back online from the GUI? With Node1 down, these are the only options I see:

Image



Image
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Tue Oct 04, 2011 9:45 am

Well you need to start full sync. Please note that if you will initialize synchronization on server1, server 2 will be the source for it.
Please see some of our FAQs by using the link below:
http://www.starwindsoftware.com/starwind-faq#q_11
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply