Correct procedure after planned downtime

Wed Aug 27, 2014 8:06 pm

I'm still concerned about that very real scenario I last posted. Without manual intervention, the entire infrastructure cannot recover from power outage. This is supposed to be high-availability! Compared to single node, this HA is actually worse from one point of view.

And that is still is - the HA assumes having at least one node up and running. Having all nodes down is emergency situation.

But anyway, I got what you are saying and obviously you have your point - the less manual interruption is required, the less problems it should cause. I`ll talk with our R&D team leadr and project manager about this. Thanks!

epalombizio · Wed Aug 27, 2014 9:37 pm

Glad to see this is finally getting attention... In design, it would help if there was a way to provide a witness that could ensure that the correct server is considered the "master" node when automatically bringing an HA cluster up from being down.

In a two node setup, I don't see how it's possible to be very sure on which node is the master without other nodes acting as a witness. Having a 3rd ( or 5th!) node as a witness could prevent a split brain scenario.. I've been bitten by this design flaw several times over the last few years.

Cheers

robnicholson · Thu Aug 28, 2014 7:55 am

In a two node setup, I don't see how it's possible to be very sure on which node is the master without other nodes acting as a witness. Having a 3rd ( or 5th!) node as a witness could prevent a split brain scenario.. I've been bitten by this design flaw several times over the last few years.

This was behind my suggestion of a maintenance mode operation although cannot the logic work something like this...

A typical UPS controlled shutdown would:

Detect power outage
Instigate shutdown after certain period (say 2 minutes)
Issue command to shutdown virtual machines & standalone physical machines
Issue command sometime later to shutdown the Hyper-V hosts
Issue command to shutdown both SAN nodes

The above has to happen pretty quickly as most average loaded UPS can't last that long - 30 minutes if you are lucky.

So the two StarWind nodes receive notification that the host operating system is shutting down. At this time, hopefully most iSCSI targets will have disconnected but not necessarily but this is doom and gloom time so forceful disconnecting targets is the preferred option to simply loosing power mid-write.

At this time, both StarWind nodes are still in communication with each other - the network is still functional. Why cannot they communicate with each other and work out who has the master copy?

BEGIN
Primary node #1 to #2 - you shutdown yet?
Node #2 - not quite, hang on - flushing stuff (holding up shutdown at this point which is perfectly acceptable)
Node #2 - okay, I've forcefully disconnected any targets I had left and flushed to disk
Primary node - okay, I got that and I'm going to flag myself as master. You get that node #2?
Node #2 - yes, I got that and I'm letting shutdown carry on or Hello, node #2? Ohh you've gone - all bets off now
END

On power-up, the primary node knows it's got the master copy and starts processing iSCSI requests (like manually saying "Mark Sync"). If the other nodes come up first, then they don't do anything until either the primary node wakes up and starts synchronising with them or some manual intervention is carried out.

Cheers, Rob.

robnicholson · Thu Aug 28, 2014 7:57 am

That logic of course would work on a per device level. Cheers, Rob.

Fri Sep 05, 2014 7:11 am

Just wondering - have you tried to shut down both servers and turn them up with LSFS devices? In that case the possibility of FastSync is great.

arinci · Sat Sep 13, 2014 2:08 pm

I'm testing the Virtual San in HA configuration, using a couple of Windows Server 2012-R2 PC that share 10GB of storage. On this shared storage I installed a Postgres database. Performance of this config is good for my scope, everything works fine on database client side, the only problem is related to the manual procedure that must be done almost every time after that the 2 PCs are shutdown together, the same situation well described in this thread.

I'd like to propose the "Starwind HA storage solution" to my customers, but I cannot rely on their ability to restore the shared storage by themselves after a downtime. For this I'd like to know if is there a way to do this "automatically", in some way. I saw that a "StarwindX" API library is available to do something with shared storage configuration: is there a way to "force" a sync through this API? Any other suggestion?

Thanks everyone

Sun Sep 14, 2014 10:33 am

There are too many people who complain so we'll come up with some sort of the solution soon. Both built-in and scriptable (so deployment engineer could come up with an own one). Let me have a meeting early next week and I'll post an update. Thanks!

arinci · Mon Sep 15, 2014 12:58 pm

anton (staff) wrote:There are too many people who complain so we'll come up with some sort of the solution soon. Both built-in and scriptable (so deployment engineer could come up with an own one). Let me have a meeting early next week and I'll post an update. Thanks!

That really sounds great!

Waiting for an update of VirtualSan...in the meantime if you have a suggestion or an alternative idea

to be tried (i.e. using StarwindX library) I could start to do some internal testings.

Thanks!
Simon

robnicholson · Wed Sep 17, 2014 10:52 am

I'd like to propose the "Starwind HA storage solution" to my customers, but I cannot rely on their ability to restore the shared storage by themselves after a downtime.

That is a scenario that I hadn't considered but a very valid one. In a small to medium sized business, IT could quite easily be outsourced with nobody on site with the skills manually get things working again.

I still think there is an automatic way to handle this if the shutdown is controlled, i.e. the two nodes communicate with each other and decide who is going to be the primary on power-up. Even better would be for the nodes to come back up knowing they were both identical at shutdown and therefore carry on without any form of initial re-sync.

Cheers, Rob.

Sun Sep 21, 2014 3:44 pm

I've discussed the situation with developers and it looks like there's a major confusion as my information was a bit outdated:

1) StarWind *DOES* tracking of what was written last (some sort of a log) and is trying to do automatic recovery in case of all replicas went down @ the same time.

2) StarWind *DOES* have ability admin to point and override the setting and start recovery from an every pointed node (so-called "mark as synchronized") if 1) did not work for some reason.

So what *exactly* you'd like to see in terms of automatic recovery for absolutely broken shutdown (admin had configured all nodes w/o UPS and managed to have all cluster nodes down not in a graceful way).

?

arinci · Sun Sep 21, 2014 7:39 pm

anton (staff) wrote:I've discussed the situation with developers and it looks like there's a major confusion as my information was a bit outdated:

I'd like to propose my idea, just my 2 cents, on how to manage this kind of situation. You wrote that each node sharing the storage is able to log the time of the last update. In this way each node knows, after having received relevant info from the others participants, which node keeps the last update of the shared storage...this is good...but what happen if all nodes are switched off, in different moment and then they are restarted? I'm thinking of two different scenario:
1 - all the nodes are switched ON at the same time: in this way they are able to talk each other and rebuild the shared storage properly
2 - nodes are switched ON with different timing: in this way when the Starwind service start, it's not aware if the local image keeps the last update or no...so the only good thing to do is wait the other node(s) before making available the HA storage via ISCSI interface. What to do if a server refuse to start? Some manual intervention is needed to mark as synchronized one of the available participants of the shared storage.

My suggestion is the following, in case that a full automatic restore is desired in every case:
1) When the Starwind service start it waits for all the participants to be ready, for a defined amount of time (user configurable, let's say 5 minutes).
2) If all the nodes became available in time, then the shared storage can be properly synced and started. In case that not all the nodes became available in time then the available nodes select the node that keeps the last update...
I'm aware that this kind of automatic logic may be dangerous for certain use, for this I suggest to have an option to enable/disable this feature.

What do you think? It's just a dream or do you think there is a chance that this dream became true?

Simon

Sun Sep 21, 2014 8:44 pm

Simon that's exactly how it works now... Automatic find out who has the most recent copy of data and then that node feeds updates to partners. Also manual intervention is allowed (operator can point forcibly to node with the most recent version of content HE THINKS should be the "master" one).

arinci · Mon Sep 22, 2014 8:44 am

anton (staff) wrote:Simon that's exactly how it works now... Automatic find out who has the most recent copy of data and then that node feeds updates to partners. Also manual intervention is allowed (operator can point forcibly to node with the most recent version of content HE THINKS should be the "master" one).

Hi anton,
I think, but maybe I'm wrong, that's NOT exactly how it works now: when you switch ON a couple of partner servers sometimes they do not agree which one is the node that keeps the last update, so a manual intervention is required. Probably this can be fixed so that "Auto Synchronization after Failure" works 100% of times when all the servers are available at the startup.
Different situation when you switch on both servers (i.e. after a blackout) and for some reason on one of the two servers the Windows o.s. refuses to boot: in this case the only way to restore the VSAN is to manually use the "Mark as "Synchronized"" feature. In my previous post I tried to suggest a full automated way to recover also in case that one of the servers is not present: try to discuss what I'm proposing with the developers, maybe they catch some ideas.

At the end my goal is to have a shared storage that is strong enough to work without manual intervention also after a complete blackout situation. For this I'm ready to implement something custom (using powershell scripting or through the StarwindX COM dll), but I need support from you to develop this.

Thanks for your attention,
Simon

robnicholson · Mon Sep 22, 2014 12:08 pm

StarWind *DOES* tracking of what was written last (some sort of a log) and is trying to do automatic recovery in case of all replicas went down @ the same time.

All I can say is that it doesn't do automatic recovery in this pretty well defined clean shutdown & power-up sequence:

Clean shutdown:
- Shutdown server with iSCSI connections to StarWind HA cluster
- Double check in console that all targets have disconnected
- Shutdown StarWind node #2
- Shutdown StarWind node #1
Clean power-up:
- Power-up StarWind node #1 and wait until the service is running (can connect via console)
- Power-up StarWind node #2 and ditto

On power-up neither node starts accepting iSCSI connections and one has to manually define one node as synchronised and the other will then re-synchronise.

Cheers, Rob.

robnicholson · Mon Sep 22, 2014 12:13 pm

Rather worryingly in that last test of clean shutdown and power-up, node #1 lost one of it's devices as well. See attached screenshot.

I have been messing around a lot with the lab here so I'm going to re-create all the storage.

Cheers, Rob.