2 node controlled shutdown / VMware / Powerchute Network Shutdown
Posted: Tue Feb 13, 2018 7:13 pm
I’ve opened a ticket with support on this but I’m posting here as well hoping that someone has ran into this situation before and come up with a solution/workaround.
Here’s our environment;
Two ESXi 6.5 hosts managed by a vCenter Server running as a virtual appliance.
Each host runs a copy of the Starwind Virtual Appliance on Windows 2016 server VMs.
These SVA’s present local storage of each node back to the VMware infrastructure as intended.
Both nodes protected by a single UPS with three hours of runtime
APC Powerchute Network Shutdown running as a Virtual Appliance.
For maintenance and HA, both the Powerchute and vCenter appliance are on a iSCSI targets presented by the SVAs and therein lies the problem.
Per APC’s installation guide, (page 82) here’s the normal sequence of events should the UPS go on battery for a predetermined period:
PowerChute reports that the UPS is on battery.
Shutdown delay for the On Battery event elapses. PowerChute sends a command to turn off the UPS or Outlet Group.
PowerChute starts a maintenance mode task on each host and then starts VM shutdown followed by vApp shutdown.
VM/vApp shutdown durations elapse.
PowerChute gracefully shuts down the vCenter Server VM.
vCenter VM shutdown duration elapses. PowerChute starts executing the shutdown command file.
Shutdown command file duration elapses and PowerChute gracefully shuts down the VMware hosts that are not running the vCenter Server or PowerChute VM.
PowerChute shuts down the VMware host running vCenter Server VM followed by the host running PowerChute VM.
Problem being a step 3 even with sequenced shutdown, the Starwind VMs are powered off and the iSCSI targets become unavailable. In testing, we’ve seen that this means at the least that the hosts themselves never power off and at the worst the vCenter Server vmdk will be corrupted.
In the past, Starwind support has recommended moving the Powerchute software to another physical host. While this is agreeable, it only solves half of the equation as it does not address what to do with the vCenter Appliance.
Some may suggest running vCenter on a physical machine, perhaps the same as the Powerchute. This is a two fold problem. 1. It takes what is now a HA appliance and moves it to a single point of failure. 2. And perhaps more importantly, since 6.5 VMware’s preferred and recommended installation method for vCenter has been as an appliance.
Admittedly this is a bit of a catch 22. Again, hopefully someone has ran across this before and come up with a creative solution.
Here’s our environment;
Two ESXi 6.5 hosts managed by a vCenter Server running as a virtual appliance.
Each host runs a copy of the Starwind Virtual Appliance on Windows 2016 server VMs.
These SVA’s present local storage of each node back to the VMware infrastructure as intended.
Both nodes protected by a single UPS with three hours of runtime
APC Powerchute Network Shutdown running as a Virtual Appliance.
For maintenance and HA, both the Powerchute and vCenter appliance are on a iSCSI targets presented by the SVAs and therein lies the problem.
Per APC’s installation guide, (page 82) here’s the normal sequence of events should the UPS go on battery for a predetermined period:
PowerChute reports that the UPS is on battery.
Shutdown delay for the On Battery event elapses. PowerChute sends a command to turn off the UPS or Outlet Group.
PowerChute starts a maintenance mode task on each host and then starts VM shutdown followed by vApp shutdown.
VM/vApp shutdown durations elapse.
PowerChute gracefully shuts down the vCenter Server VM.
vCenter VM shutdown duration elapses. PowerChute starts executing the shutdown command file.
Shutdown command file duration elapses and PowerChute gracefully shuts down the VMware hosts that are not running the vCenter Server or PowerChute VM.
PowerChute shuts down the VMware host running vCenter Server VM followed by the host running PowerChute VM.
Problem being a step 3 even with sequenced shutdown, the Starwind VMs are powered off and the iSCSI targets become unavailable. In testing, we’ve seen that this means at the least that the hosts themselves never power off and at the worst the vCenter Server vmdk will be corrupted.
In the past, Starwind support has recommended moving the Powerchute software to another physical host. While this is agreeable, it only solves half of the equation as it does not address what to do with the vCenter Appliance.
Some may suggest running vCenter on a physical machine, perhaps the same as the Powerchute. This is a two fold problem. 1. It takes what is now a HA appliance and moves it to a single point of failure. 2. And perhaps more importantly, since 6.5 VMware’s preferred and recommended installation method for vCenter has been as an appliance.
Admittedly this is a bit of a catch 22. Again, hopefully someone has ran across this before and come up with a creative solution.