Page 1 of 1

2 node controlled shutdown / VMware / Powerchute Network Shutdown

Posted: Tue Feb 13, 2018 7:13 pm
by gsxesx
I’ve opened a ticket with support on this but I’m posting here as well hoping that someone has ran into this situation before and come up with a solution/workaround.

Here’s our environment;
Two ESXi 6.5 hosts managed by a vCenter Server running as a virtual appliance.
Each host runs a copy of the Starwind Virtual Appliance on Windows 2016 server VMs.
These SVA’s present local storage of each node back to the VMware infrastructure as intended.
Both nodes protected by a single UPS with three hours of runtime
APC Powerchute Network Shutdown running as a Virtual Appliance.
For maintenance and HA, both the Powerchute and vCenter appliance are on a iSCSI targets presented by the SVAs and therein lies the problem.

Per APC’s installation guide, (page 82) here’s the normal sequence of events should the UPS go on battery for a predetermined period:

PowerChute reports that the UPS is on battery.
Shutdown delay for the On Battery event elapses. PowerChute sends a command to turn off the UPS or Outlet Group.
PowerChute starts a maintenance mode task on each host and then starts VM shutdown followed by vApp shutdown.
VM/vApp shutdown durations elapse.
PowerChute gracefully shuts down the vCenter Server VM.
vCenter VM shutdown duration elapses. PowerChute starts executing the shutdown command file.
Shutdown command file duration elapses and PowerChute gracefully shuts down the VMware hosts that are not running the vCenter Server or PowerChute VM.
PowerChute shuts down the VMware host running vCenter Server VM followed by the host running PowerChute VM.

Problem being a step 3 even with sequenced shutdown, the Starwind VMs are powered off and the iSCSI targets become unavailable. In testing, we’ve seen that this means at the least that the hosts themselves never power off and at the worst the vCenter Server vmdk will be corrupted.

In the past, Starwind support has recommended moving the Powerchute software to another physical host. While this is agreeable, it only solves half of the equation as it does not address what to do with the vCenter Appliance.

Some may suggest running vCenter on a physical machine, perhaps the same as the Powerchute. This is a two fold problem. 1. It takes what is now a HA appliance and moves it to a single point of failure. 2. And perhaps more importantly, since 6.5 VMware’s preferred and recommended installation method for vCenter has been as an appliance.

Admittedly this is a bit of a catch 22. Again, hopefully someone has ran across this before and come up with a creative solution.

Re: 2 node controlled shutdown / VMware / Powerchute Network Shutdown

Posted: Wed Feb 14, 2018 12:47 am
by Benoire
Does powerchute allow you to run scripts? You could script using powercli to stop all the VMs and then enter maintenance / shutdown and you shouldn't need vcenter running for that if you connect directly to the hosts themselves. Clearly powerchute would need to be on another machine otherwise you'd have to issue a stop command for the host after all the other vms have shutdown and have the powerchute VM turn off incorrectly.

I've got a three host setup on VMware vsan that I'm about to move to a 2 node config with SW for various reasons. My Dell 1920W UPS has the management card so I intend to use Dell MUMC to control a powercli script to do the following:

Power Loss
On power loss + 10 minutes, turn off HA/DRS and evacuate all VMs from Host 2 that are required to remain up for as long as possible, then shutdown host 2 using the vcenter connections in MUMC
On power loss + 30 minutes, shutdown all VMs from Host 1 and initiate a host shutdown from direct connection to host as vcenter will also be gone.
On 5% UPS power left, initiate Synology NAS shutdown including the VM holding MUMC and turn off UPS until power restored.

Power Restoration
The two hosts plus the NAS should come back online once power is restored, I don't believe I can specify to wait for battery to charge backup to a certain %age.
Script should then turn on DRS/HA and turn on VMs that are required to run 24/7
DRS/HA will then balance VMS back to the hosts as required.

VMs will be contained in various groups that should allow for this to occur, e.g. a group for VMs that can be shutdown, VMs to be moved to Host 1 etc; this way I can add and remove VMs from the groups without affecting the script. I haven't started to write this yet as I need to migrate to the new setup first but I understand from various people that PowerCLI can do these sorts of actions and all I need is a way to call the scripts at various power levels from the UPS which MUMC can do.

If Powerchute can do something similar that this might be the best way for you but you would need an additional machine such as NAS/SAN that can run other software either as an application or VM to run powerchute on.

Re: 2 node controlled shutdown / VMware / Powerchute Network Shutdown

Posted: Mon Feb 19, 2018 10:21 am
by Oleg(staff)
Hi Benoire,
Thank you for sharing the solution!