Automatic cluster shutdown

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
Alexey Sergeev
Posts: 26
Joined: Mon Feb 13, 2017 12:48 pm

Mon Aug 21, 2017 7:40 am

Hello!

I'm trying to implement correct shutdown procedure for my Hyper-V cluster in case of power outage.
We have Host 1 and Host 2 with Starwind vSAN (free version) installed on them. Both connected to the UPS1 and UPS2 (Smart-UPS SUA1500) in redundant fashion.
Heartbeat and synchronization channels using straight 10 GB connections.
Host 1 has first synch priority. Host 2 has second synch priority.

Shutdown sequence:

1. After power is down Host 2 waiting for 10 minutes on battery.
2. Then HA VM's migrating to the Host 1 (3 minutes timeout).
3. Then other VM's on Host 2 begin to shutdown (2 minutes timeout).
4. After that Host 2 shuts down itself.
5. Host 1 waiting for 20 minutes on battery power, then begin to shutdown all working VM's (2 minutes timeout).
6. After that Host 1 shuts down itself.

Did I miss something or everything is correct?

I'd like to have better understanding of Starwind behavior in case of total blackout, because when Host 1 started again, he couldn't connect to iSCSI targets and cluster couldn't start.
Its Starwind devices were in 'Not synchronized' state until I've started Host 2 and after connection was established between them status changed to 'Synchronized' and iSCSI targets became available again.
Full synchronization completed without errors, HA VM's started with no problems. Should I manually mark as synchronized my Stawind devices on the Host 1 before starting the second one?
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Mon Aug 21, 2017 12:16 pm

Hello, Alexey Sergeev, and thank you for your question. Please check this article from StarWind Knowledge Base, where all the reasons for full sync are listed:
https://knowledgebase.starwindsoftware. ... may-start/

After both nodes of the HA cluster were down simultaneously in some cases StarWind is not able to determine which node holds the most recent data. So it blocks all the incoming connections until the synchronization begins to prevent the data corruption. If you know for sure which node was shut down last – choose it as the synchronization source in the drop down menu of the HA device – Synchronization, or mark as synchronized the node with actual data choose from the dropbox by right clicking on the HA device and choosing the corresponding option. You can use GetHASyncState.ps1 script for free version. You can find it in C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell by default.

The second option assumes automatic start of the synchronization. If you are not sure about it – delete the HA targets from the StarWind Management console, mount HA images as basic targets. Important notice: keep in mind that you just need to find out if the data is consistent, so make sure that any new data won`t be written to the target on this step. Check which node has the most recent data, remove the target and recreate the HA device running the synchronization in appropriate direction.
Alexey Sergeev
Posts: 26
Joined: Mon Feb 13, 2017 12:48 pm

Mon Aug 21, 2017 12:57 pm

I've read the article, thank you.
So in the case when I know for sure which host has been shut down first and last I could use automatic synchronization simply by turning hosts online one by one.
And then they will synchronize according to their priorities, right?

I was a bit puzzled to see all devices have been not synchronized after Host 1 restart. While reviewing system logs I found out that they changed their state right before Starwind service stopped on Host 1.
Is this by design? So we must either manually choose synchronization source or just rely on auto-sync?
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Mon Aug 21, 2017 3:39 pm

Both nodes were turned off at the same time, and that is the reason for full sync. In order to avoid data loss, we would recommend doing the following:
stop StarWind service on node 2;
mark device as synchronized on the 1st node and check if the data on it is relevant;
if so, start StarwindService on the 2nd node and wait for the full sync.
Alexey Sergeev
Posts: 26
Joined: Mon Feb 13, 2017 12:48 pm

Mon Aug 21, 2017 5:59 pm

No, nodes weren't stopped at the same time. In my first post I wrote that Host 2 shut down itself first and only after that Host 1 was stopped. Right before Host 1 began to shutdown I verified that all devices on it have been synchronized.
So I've expected that after I restart it again they would keep their state, but they're not.
I understand that full synchronization is needed anyways. I just want to make sure that hosts could gracefully shutdown themselves in case of power outage at midnight for example without anybody's help.
Sergey (staff)
Staff
Posts: 86
Joined: Mon Jul 17, 2017 4:12 pm

Wed Aug 23, 2017 12:46 pm

After all cluster nodes were down and now are up StarWind is trying to see which node has recent data (log with transactions kept on every node) and that data is in integral state (caches flushed properly so there are no partial transactions). If so, then StarWind does automatic sync and powers up virtual LUNs so they can serve customers. If not, StarWind waits for the operator so human manually can point to the node to assign "synchronized" status and start synchronization.
If caches were not flushed and were purged, an automatic process never starts and StarWind waits for operator intervention. If power outage is a serious problem, you can consider using write-through cache, rather than write-back. Write-through cache makes sure that previous write command has accomplished before writing the next data packet.
Alexey Sergeev
Posts: 26
Joined: Mon Feb 13, 2017 12:48 pm

Wed Aug 23, 2017 7:55 pm

Now I get it, thank you, Sergey! That's what I needed to know.
Ivan (staff)
Staff
Posts: 172
Joined: Thu Mar 09, 2017 6:30 pm

Thu Aug 24, 2017 3:21 pm

Hello Alexey,
You are always welcome.
Please do not hesitate to ask any additional questions.
tiaccadi
Posts: 9
Joined: Tue Oct 24, 2017 7:09 am

Fri Dec 08, 2017 5:50 pm

Hello,

I've to wake up again this thread, because I've some doubts regarding VSAN shutdown / restart

Right now I've a running hyperconverged Hyper-V+StarWind cluster with a couple of HA images configured
I'm able to shutdown one node (e.g. node 1) and the other one (node 2) takes all the resources (i.e. Hyper-V and StarWind images)
Once I power on again node 1, as soon as StarWind services start what I think is a fast sync (please confirm this) occours: few seconds and HA image is served by both nodes
Obviously, same behavior if I switch the nodes (node 2 shutted down, node 1 still running)

What I'm not sure about is what will happen in these situations:

1) node 1 shutted down gracefully; after 5 minutes, even node 2 is shutted off gracefully; after 5 minutes, both nodes are switched on again (almost in the same time)
2) node 1 and 2 shuts down gracefully but more or less in the same time; after 5 minutes, both nodes are switched on again (almost in the same time)
3) node 1 and 2 do an unexpected power off (e.g. power outage without UPS or UPS running out of batteries); once power is back, both nodes are switched on again (almost in the same time)

What will happen in these situations, and what I'm supposed to do after nodes wake up again?
Moreover: will these 3 scenario (and automatic / manual activities required) change if I don't switch on both nodes in the same time (maybe identifying which one shutted down as last)?

Thank you!
Michael (staff)
Staff
Posts: 317
Joined: Thu Jul 21, 2016 10:16 am

Mon Dec 11, 2017 10:17 am

Hello Tiaccadi,
Please find here a link when full synchronization can occur: https://knowledgebase.starwindsoftware. ... may-start/
Answering your questions - if you gracefully shutdown one of the nodes, StarWind will do a fast synchronization when it will back up.
1) node 1 shutted down gracefully; after 5 minutes, even node 2 is shutted off gracefully; after 5 minutes, both nodes are switched on again (almost in the same time)
2) node 1 and 2 shuts down gracefully but more or less in the same time; after 5 minutes, both nodes are switched on again (almost in the same time)
According to KB article, StarWind will do a full synchronization if both StarWind services were stopped. By design, StarWind services will identify what devices have the most recent data and will start full synchronization in a correct way.
3) node 1 and 2 do an unexpected power off (e.g. power outage without UPS or UPS running out of batteries); once power is back, both nodes are switched on again (almost in the same time)
In this case, both devices could stay not synchronized after nodes back up. In order to resolve this situaltion, device with the most recent data should be marked as Synchronized manually. This KB article should be helpful for such case: https://knowledgebase.starwindsoftware. ... -blackout/
Additionally, next StarWind build will have Maintenance mode for HA devices, thus you can put devices in the Maintenence mode, shutdown/turn off both servers and after servers back up and leaving Maintenance mode, StarWind will do a fast synchronization. It will come very soon!
bienvenu
Posts: 2
Joined: Fri Mar 02, 2018 9:31 am

Fri Mar 02, 2018 9:38 am

Michael (staff) wrote:Hello Tiaccadi,
Please find here a link when full synchronization can occur: https://knowledgebase.starwindsoftware. ... may-start/
Answering your questions - if you gracefully shutdown one of the nodes, StarWind will do a fast synchronization when it will back up.
1) node 1 shutted down gracefully; after 5 minutes, even node 2 is shutted off gracefully; after 5 minutes, both nodes are switched on again (almost in the same time)
2) node 1 and 2 shuts down gracefully but more or less in the same time; after 5 minutes, both nodes are switched on again (almost in the same time)
According to KB article, StarWind will do a full synchronization if both StarWind services were stopped. By design, StarWind services will identify what devices have the most recent data and will start full synchronization in a correct way.
3) node 1 and 2 do an unexpected power off (e.g. power outage without UPS or UPS running out of batteries); once power is back, both nodes are switched on again (almost in the same time)
In this case, both devices could stay not synchronized after nodes back up. In order to resolve this situaltion, device with the most recent data should be marked as Synchronized manually. This KB article should be helpful for such case: https://knowledgebase.starwindsoftware. ... -blackout/
Additionally, next StarWind build will have Maintenance mode for HA devices, thus you can put devices in the Maintenence mode, shutdown/turn off both servers and after servers back up and leaving Maintenance mode, StarWind will do a fast synchronization. It will come very soon!
Hello, I found this discussion very interesting. But I have a question that crosses my mind with the last answer. how does one manage to execute for an HA command in powershell 'MArk as Synchronized'?
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Mar 05, 2018 3:06 pm

In the StarWindX samples folder (C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell) have a look at the MarkAsSynchronized.ps1 script. Feel free to modify it according to your needs.
Post Reply