The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software
Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)
You're a good man to point that out. It will be implemented today!Boris (staff) wrote:From 11818 on, StarWind VSAN has got Maintenance Mode for the HA devices...
Great script, and Starwind works perfectly with maintenance mode.Boris (staff) wrote:...have a look at https://www.starwindsoftware.com/resour ... powerchute ...
Actually, Something is not quite right with the Starwind bit.Boris (staff) wrote:Did the script work as expected for you?
Code: Select all
PS C:\Windows\system32> C:\Scripts\SW-SyncState.ps1
Device sync state, node 1:
iqn.2008-08.com.starwindsoftware:map59-n1-target3 Status: 1, completed 0%
Device sync state, node 2:
iqn.2008-08.com.starwindsoftware:map59-n2-target3 Status: 2, completed 5%
PS C:\Windows\system32>
Code: Select all
Putting Starwind in maintenance mode
HAImage3: Entered maintenance mode
imagefile3: Not an HA device
Code: Select all
Putting Starwind in maintenance mode
Operation cannot be completed. Maintenance mode is already turned on.
Code: Select all
Taking Starwind out of maintenance mode
HAImage3: Operation cannot be completed. Maintenance mode is already turned off.
imagefile3: Not an HA device
Code: Select all
Taking Starwind out of maintenance mode
HAImage3: Operation cannot be completed. Maintenance mode is already turned off.
imagefile3: Not an HA device
Code: Select all
Warning: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: maintenance mode is turning ON...
...Two seconds later all of these events together at same timestamp:
Warning: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: synchronization connection IP 192.168.20.2 with partner node iqn.2008-08.com.starwindsoftware:map59-n2-target3 lost.
Warning: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: heartbeat connection IP 192.168.40.2 with partner node iqn.2008-08.com.starwindsoftware:map59-n2-target3 lost.
Warning: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: synchronization connection IP 192.168.21.2 with partner node iqn.2008-08.com.starwindsoftware:map59-n2-target3 lost.
Error: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: all synchronization connection with partner node iqn.2008-08.com.starwindsoftware:map59-n2-target3 lost.
Error: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: partner node iqn.2008-08.com.starwindsoftware:map59-n2-target3 state has changed to "Not synchronized".
Warning: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: heartbeat connection IP 192.168.10.22 with partner node iqn.2008-08.com.starwindsoftware:map59-n2-target3 lost.
Error: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: all heartbeat connection with partner node iqn.2008-08.com.starwindsoftware:map59-n2-target3 lost.
Error: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: maintenance mode operation is rolled back for current node.
Three seconds later...
Error: HA Device iqn.2008-08.com.starwindsoftware:map59-n1-target3: current node state has changed to "Not synchronized".
Information: Service is stopped - StarWind Virtual SAN v8.0.0 (Build 12767, [SwSAN], Win64).
Code: Select all
$NodeFile = "C:\Scripts\nodes.tmp"
$NodeName = (Get-WmiObject win32_computersystem).DNSHostName+"."+(Get-WmiObject win32_computersystem).Domain
# Save the cluster node names for the ExitMaintenance script to use later
$nodes = (Get-ClusterNode).Name
$nodes | Set-Content -Path $NodeFile
# Shut down VMs on this node, and prevent live migration by setting the owner node to just this node
Write-Host "Stopping $((Get-VM).Count) virtual machines"
$VMs = (Get-VM).Name
Get-VM | Stop-VM -Save -AsJob > $null 2>&1
#Get-VM | Stop-VM -AsJob > $null 2>&1
Get-ClusterResource | ? {($VMs -contains $_.OwnerGroup) -and ($_.ResourceType -eq "Virtual Machine")} | Set-ClusterOwnerNode -Owners (Get-WmiObject win32_computersystem).DNSHostName
# Wait until all virtual machine resources on all cluster nodes are offline
Write-Host "Waiting for all virtual machines to stop"
do { sleep 1 } while (((Get-ClusterResource | ? {($_.ResourceType -eq "Virtual Machine") -and ($_.State -ne "Offline")}) | measure).Count -gt 0)
# take all cluster shared volumes offline
Write-Host "Taking cluster shared volumes offline"
do {
Sleep 1
$OnlineCSVs = Get-ClusterSharedVolume | ? {$_.State -eq 'Online'}
foreach ($OnlineCSV in $OnlineCSVs) { Stop-ClusterResource -Name $OnlineCSV.Name -ErrorAction SilentlyContinue }
} until (($OnlineCSVs | measure).Count -eq 0)
#Exit-PSSession
Import-Module StarWindX
# Set Starwind devices in maintenance mode
Write-Host "Putting Starwind in maintenance mode"
try {
$server = New-SWServer -host $NodeName -port 3261 -user root -password starwind
$server.Connect()
foreach($device in $server.Devices) {
if( !$device ) {
Write-Host "No device found" -foreground red
return
} else {
$disk = $device.Name
if ($device.Name -like "HAimage*") {
$device.SwitchMaintenanceMode($true, $true)
Write-Host "$($disk): Entered maintenance mode"
} else {
Write-Host "$($disk): Not an HA device"
}
}
}
}
catch {
Write-Host $_ -foreground red
}
finally {
$server.Disconnect()
}
# Create a scheduled task to disable maintenance mode on startup
Write-Host "Creating scheduled task to exit maintenance mode"
try {
$action = New-ScheduledTaskAction -Execute "Powershell.exe" -Argument '-command "Powershell -ExecutionPolicy Bypass -NoProfile -File C:\Scripts\SW-ExitMaintenance.ps1 > C:\Scripts\SW-ExitMaintenance.log 2>&1"'
$trigger = New-ScheduledTaskTrigger -AtStartup -RandomDelay 00:00:30
$settings = New-ScheduledTaskSettingsSet -Compatibility Win8
$principal = New-ScheduledTaskPrincipal -UserId SYSTEM -LogonType ServiceAccount -RunLevel Highest
$definition = New-ScheduledTask -Action $action -Principal $principal -Trigger $trigger -Settings $settings -Description "Exit maintenance mode for Starwind HA devices"
Register-ScheduledTask -TaskName "Maintenance Mode Off" -InputObject $definition > $null 2>&1
}
catch {
Write-Host $_ -foreground red
}
# Set the Cluster and Virtual Machine Management services to manual
Write-Host "Setting cluster and vmms services to manual startup"
Get-Service -Name vmms | Set-Service -StartupType Manual
Get-Service -Name ClusSvc | Set-Service -StartupType Manual
# Shut down the node
Write-Host "Stopping cluster node"
Stop-Computer -Force
Code: Select all
$NodeFile = "C:\Scripts\nodes.tmp"
$ServiceTimeout = 120
$NodeName = (Get-WmiObject win32_computersystem).DNSHostName+"."+(Get-WmiObject win32_computersystem).Domain
# Get the cluster node names that were saved by the UPS-Shutdown-Node script (can't use Get-ClusterNode as cluster not started yet)
$nodes = Get-Content -Path $NodeFile
Import-Module StarWindX
$ServiceName = "StarWindService"
do {
sleep 1
$s1 = Get-Service -ComputerName $nodes[0] -Name $ServiceName -ErrorAction SilentlyContinue
$s2 = Get-Service -ComputerName $nodes[1] -Name $ServiceName -ErrorAction SilentlyContinue
} until (($s1.Status -eq "Running") -and ($s2.Status -eq "Running"))
Write-Host "$ServiceName is running on both nodes"
Start-Sleep -Milliseconds (Get-Random -Maximum 5000)
# Take Starwind devices out of maintenance mode
Write-Host "Taking Starwind out of maintenance mode"
try {
$server = New-SWServer -host $NodeName -port 3261 -user root -password starwind
$server.Connect()
foreach ($device in $server.Devices) {
if (!$device) {
Write-Host "No device found" -foreground red
return
} else {
$disk = $device.Name
if ($device.Name -like "HAimage*") {
try {
$device.SwitchMaintenanceMode($false, $true)
Write-Host "$($disk): Exited maintenance mode"
}
catch {
Write-Host "$($disk): $($_)"
}
} else {
Write-Host "$($disk): Not an HA device"
}
}
}
}
catch {
Write-Host $_ -foreground red
}
finally {
$server.Disconnect()
}
# Set the Cluster and Virtual Machine Management services to automatic, and start the services
Write-Host "Setting cluster and vmms services to manual startup"
Get-Service -Name vmms | Set-Service -StartupType Automatic
Get-Service -Name vmms | Start-Service
Get-Service -Name ClusSvc | Set-Service -StartupType Automatic
Get-Service -Name ClusSvc | Start-Service
$c1 = 0
$c2 = 0
$c3 = 0
$c4 = 0
do {
$s1 = (Get-Service -ComputerName $nodes[0] -Name vmms).Status
$s2 = (Get-Service -ComputerName $nodes[0] -Name ClusSvc).Status
$s3 = (Get-Service -ComputerName $nodes[1] -Name vmms).Status
$s4 = (Get-Service -ComputerName $nodes[1] -Name ClusSvc).Status
if ($s1 -ne "Running") { $c1 += 1 } else { Write-Host "$($nodes[0]): Virtual Machine Management service running after $($c1) seconds" }
if ($s2 -ne "Running") { $c2 += 1 } else { Write-Host "$($nodes[0]): Cluster service running after $($c2) seconds" }
if ($s3 -ne "Running") { $c3 += 1 } else { Write-Host "$($nodes[1]): Virtual Machine Management service running after $($c3) seconds" }
if ($s4 -ne "Running") { $c4 += 1 } else { Write-Host "$($nodes[1]): Cluster service running after $($c4) seconds" }
if (($s1 -ne "Running") -or ($s2 -ne "Running") -or ($s3 -ne "Running") -or ($s4 -ne "Running")) { Sleep 1 }
} until ((($s1 -eq "Running") -and ($s2 -eq "Running") -and ($s3 -eq "Running") -and ($s4 -eq "Running")) -or ($c1 -gt $ServiceTimeout) -or ($c2 -gt $ServiceTimeout) -or ($c3 -gt $ServiceTimeout) -or ($c4 -gt $ServiceTimeout))
if (($c1 -gt $ServiceTimeout) -or ($c2 -gt $ServiceTimeout) -or ($c3 -gt $ServiceTimeout) -or ($c4 -gt $ServiceTimeout)) {
Write-Host "A service failed to start"
exit
}
# Make sure all CSVs are online
Write-Host "Waiting for all cluster shared volumes online"
do {
Sleep 1
$OfflineCSVs = Get-ClusterSharedVolume | ? {$_.State -ne 'Online'}
foreach ($OfflineCSV in $OfflineCSVs) { Start-ClusterResource -Name $OfflineCSV.Name -ErrorAction SilentlyContinue > $null 2>&1 }
} until (($OfflineCSVs | measure).Count -eq 0)
# Small pause to ensure cluster recognises that cluster shared volumes are up
Sleep 15
# Start the VMs on all nodes, and set any node to be the owner
Write-Host "Starting virtual machines"
Get-ClusterResource | ? {($_.ResourceType -eq "Virtual Machine") -and ($_.State -eq "Offline")} | Start-ClusterResource
Write-Host "Setting virtual machine possible owners to all nodes"
Get-ClusterResource | ? {($_.ResourceType -eq "Virtual Machine") -and ($_.State -eq "Offline")} | Set-ClusterOwnerNode -Owners (Get-ClusterNode).Name
Write-Host "Unregistering scheduled task"
Unregister-ScheduledTask -TaskName "Maintenance Mode Off" -Confirm:$false -ErrorAction SilentlyContinue