VSAN Free recover from RAID array crash

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
RobT64
Posts: 10
Joined: Thu Oct 14, 2021 3:51 am

Thu Oct 14, 2021 4:08 am

Hi all,

I had two hard drives fail while I was away on leave, and take down the RAID array on one node of the VSAN. I've rebuild the array, and all the config was still there because the boot drive is separate. I ran RemoveHAPartner for both images on the working node so it could come online. I'm trying to run AddHaPartner on the recovered machine but I cannot figure out the syntax of the script. Is there a detailed help file anywhere so assist me in getting this running again?

This is for a disaster recovery cluster, so not critical, (unless we need it :( )

Rob.
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Thu Oct 14, 2021 6:10 am

Hi,
Run it from the healthy node and hust fill in the IP addresses.
Make sure to delete the files from the underlying storage of the affected node before running the script.
RobT64
Posts: 10
Joined: Thu Oct 14, 2021 3:51 am

Thu Oct 14, 2021 10:39 pm

I actually did that before posting. I get "200 Failed: operation cannot be completed." Because I am partially blind, and the responses are in very small writing, in red, that's about all I can read. I'm sure it's gonna be a syntax thing cos I can see both nodes in the management console, (even though I can't edit them without a paid licence. I can see CLStorage1 and Witness1 targets on the other node, with no devices attached. So I'm guessing it's just a matter of figuring out which IP addresses to put where, and which alias and device names matter in the script.

If I can get some documentation or examples of exactly what each line means I will build recovery scripts and save them on another location so I have everything I need if this happens again. I just need help setting them up first.

Rob.
RobT64
Posts: 10
Joined: Thu Oct 14, 2021 3:51 am

Mon Oct 18, 2021 1:50 am

Can anyone else help? I'm happy to post examples of the scripts I created and network details of the node. I've got as far as getting it to recreate the image file, but it doesn't copy the licence text files and fails to enable HA. If one doesn't exist, I'm happy to create my own idiots guide. I don't use these scripts anywhere near often enough to remember them.

Rob.
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 18, 2021 2:58 am

Could you please check if the files were removed from the underlying storage of the affected server?
Can I have the screenshot of the affected server in the management console, please?
RobT64
Posts: 10
Joined: Thu Oct 14, 2021 3:51 am

Mon Oct 18, 2021 3:12 am

I just broke the other node trying to fix things :( I used the commands to remove the device on the remote node but forgot to edit the IP address. I may or may not have said some very naughty words!

I do have a backup of the two images files. Is there an easy way to recreate the HA stuff using existing files? I have two nodes;

HCCL1SS1 192.168.242.11
sync 10.25.0.11
heartbeat 172.25.0.11

HCCL2SS2 192.168.242.12
sync 10.25.0.12
heartbeat 172.25.0.12

I have two files called Witness1 and Clstorage1, plus the associated text files, restored into the starwind folder on S drive, ("My computer\s\starwind")
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 18, 2021 4:12 am

Did you remove the wrong server from the replication partners OR did you remove the .img and headers from the wrong server?
RobT64
Posts: 10
Joined: Thu Oct 14, 2021 3:51 am

Mon Oct 18, 2021 4:18 am

I now have two servers with no devices or targets. Recreating the witness will not be an issue. But recreating the cluster storage might be, because it's 5Tb. I do still have the image file for the cluster storage. I just need to recreate teh devices and targets then synchronize to the other server.
RobT64
Posts: 10
Joined: Thu Oct 14, 2021 3:51 am

Mon Oct 18, 2021 4:30 am

I just edited the CreateHA(two nodes).ps1 file and saved it as CreateHA(two nodes)Witness1.ps1 with the first bit entered as;

Import-Module StarWindX

try
{
Enable-SWXLog

$server = New-SWServer -host 192.168.242.12 -port 3261 -user root -password starwind

$server.Connect()

$firstNode = new-Object Node

$firstNode.HostName = "HCCL1SS2"
$firstNode.ImagePath = "My computer\S\starwind"
$firstNode.ImageName = "Witness1"
$firstNode.Size = 1024
$firstNode.CreateImage = $true
$firstNode.TargetAlias = "Witness1"
$firstNode.AutoSynch = $true
$firstNode.SyncInterface = "#p2=10.25.0.12:3260"
$firstNode.HBInterface = "#p2=172.25.0.12:3260"
$firstNode.PoolName = "pool1"
$firstNode.SyncSessionCount = 1
$firstNode.ALUAOptimized = $true

#
# device sector size. Possible values: 512 or 4096(May be incompatible with some clients!) bytes.
#
$firstNode.SectorSize = 512

$secondNode = new-Object Node

$secondNode.HostName = "192.168.242.11"
$secondNode.HostPort = "3261"
$secondNode.Login = "root"
$secondNode.Password = "starwind"
$secondNode.ImagePath = "My computer\S\starwind"
$secondNode.ImageName = "Witness1"
$secondNode.Size = 12
$secondNode.CreateImage = $true
$secondNode.TargetAlias = "Witness1"
$secondNode.AutoSynch = $true
$secondNode.SyncInterface = "#p1=10.25.0.11:3260"
$secondNode.HBInterface = "#p1=172.25.0.11:3260"
$secondNode.SyncSessionCount = 1
$secondNode.ALUAOptimized = $true


It's taking a LONG time to get past 0% for only a 1GB file :(
RobT64
Posts: 10
Joined: Thu Oct 14, 2021 3:51 am

Mon Oct 18, 2021 4:36 am

I cancelled it after 10 minutes at still 0% and tried the syncHAadvanced script and got this error;

HAImage1
Device not synchronized. Synchronize current node from partner 'iqn.2008-08.com.starwindsoftware:hccl1ss1.**********.com-witness1'
Request to HCCL1SS2.********.com ( 192.168.242.12 ) : 3261
-
control 0x000000A0D7650840 -RestorePartnerNode:"iqn.2008-08.com.starwindsoftware:hccl1ss1.********.com-witness1"
-
200 Failed: connection with partner node is invalid..
PS C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell>
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 18, 2021 4:44 am

Rob,

For creating the HA device, you must use the partner IP address. See the sample script here https://forums.starwindsoftware.com/vie ... p+3#p31505.
Do you have any data on the 5 TB volume? If so, please log a call with StarWind Support at support@starwind.com us this thread as a reference.
RobT64
Posts: 10
Joined: Thu Oct 14, 2021 3:51 am

Mon Oct 18, 2021 5:11 am

I logged a support request to fix the cluster storage. I retried the witness script with IP addresses and got this;

PS C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell> & '.\CreateHA(two nodes)Witness1.ps1'
Connection to HCCL1SS1 ( 10.25.0.11 ) : 3261 has failed

I have a feeling I bound the management to 192.167.242.11 and forced sync and heartbeat to only use the 172.25.0 and 10.25.0 subnets.
I changed the AddHAPartner script to this;
param($addr="192.168.242.12", $port=3261, $user="root", $password="starwind", $deviceName="HAImage1",
$addr2="192.168.242.11", $port2=$port, $user2=$user, $password2=$password,
#secondary node
$imagePath2="My computer\S\Starwind",
$imageName2="Witness1",
$createImage2=$true,
$targetAlias2="Witness1",
$autoSynch2=$true,
$poolName2="pool1",
$syncSessionCount2=1,
$aluaOptimized2=$true,
$cacheMode2="node",
$cacheSize2=0,
$syncInterface2="#p2=10.25.0.12:3260",
$hbInterface2="#p2=172.25.0.12:3260",
$selfSyncInterface="#p1=10.25.0.11:3260",
$selfHbInterface="#p1=172.25.0.11:3260"
)

and got this;
PS C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell> .\AddHaPartnerHCCL1SS2-witness1.ps1
Request to HCCL1SS2.****.com ( 192.168.242.12 ) : 3261
-
control 0x000000A0D5E65F40 -AddPartner:"" -PartnerTargetName:"#p1=iqn.2008-08.com.starwindsoftware:hccl1ss1.****.com-witness1" -Priority:"#p1=2" -nodeType:"#p1=1" -PartnerIP:"#p1=10.25.0.11:sync:3260:1,172.25.0.11:heartbeat:3260:1
" -AuthChapType:"#p1=none" -AuthChapLogin:"#p1=0b" -AuthChapPassword:"#p1=0b" -AuthMChapName:"#p1=0b" -AuthMChapSecret:"
#p1=0b" -Replicator:"#p1=0"
-
200 Failed: operation cannot be completed..
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 18, 2021 5:21 am

Rob,

Please see a good example of the CreateHA script here https://forums.starwindsoftware.com/vie ... p+3#p31505.
RobT64
Posts: 10
Joined: Thu Oct 14, 2021 3:51 am

Wed Oct 20, 2021 7:13 am

I never wound up finding a solution for the scripts. I looked at the "good examples and they did not work either. Starwind sent me a trial code which I used to recreate everything in the management console, before switching back to the "free" key.

Rob.
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Wed Oct 20, 2021 7:22 am

Hi Rob,

That was a custom trial key. A regular one will not let you switch back to trial.
Thanks for your time and effort!
Post Reply