Recreating HA Availability after failure in 2 Node VSAN

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Mon Feb 22, 2021 6:11 am

Hi All,

I have a 2 node VSAN that is running a Hyper V cluster. Unfortunately, the storage on one one for the nodes partially failed and I had to recreate the node and reinstall Starwind on the failed node. My challenge is that the existing node and Hyper-V cluster is still running on the remaining node, and while I have a backup of the VMs and fileshares of the cluster I want to avoid recovering from the backup if I can. I would just like to recreate HA between the nodes by using the existing storage without loosing data. Originally, I used powershell to create the VSAN using the StraWindX powershell scripts and it was running seamlessly until now. I have looked at the powershell scripts but I can't work out which ones to use to recover HA. Can anyone advise ?
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Feb 22, 2021 7:05 am

Please drain the roles from the unhealthy node (Go to Failover Cluster Manager -> Nodes -> Pause -> Drain roles).
Run on the healthy node RemoveHAPartner and then do AddHaPartner scripts.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Mon Feb 22, 2021 8:28 am

Hi Yaroslav,

Thank you for the advice. Just to be clear in the AddHaPartner script $addr= address of the healthy node and $addr2= address of the unhealthy node. i.e. $addr2 is the node I am trying to recreate the HA VSAN storage.
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Feb 22, 2021 9:05 am

That's right.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Mon Feb 22, 2021 10:43 am

Hi Yaroslav,

I updated AddHAPartner with the following parameters,

Code: Select all

param($addr="10.0.2.12", $port=3261, $user="root", $password="starwind", $deviceName="HAImage4",
	$addr2="10.0.2.14", $port2=$port, $user2=$user, $password2=$password,
#secondary node
	$imagePath2="My computer\D\starwind",
	$imageName2="partnerImg22",
	$createImage2=$true,
	$targetAlias2="partnerha22",
	$autoSynch2=$true,
	$poolName2="pool1",
	$syncSessionCount2=1,
	$aluaOptimized2=$true,
	$syncInterface2="#p1=172.16.20.10:3260" -f $addr,
    $hbInterface2="#p1=172.16.10.10:3260" -f $addr,
    $selfSyncInterface="#p2=172.16.20.20:3260" -f $addr2,
    $selfHbInterface="#p2=172.16.10.20:3260" -f $addr2
	)


Import-Module StarWindX

try
{
    Enable-SWXLog -level SW_LOG_LEVEL_DEBUG
    
    $server = New-SWServer $addr $port $user $password
    $server.Connect()

	$device = Get-Device $server -name $deviceName
	if( !$device )
	{
		Write-Host "Device not found" -foreground red
		return
	}

    $node = new-Object Node
    $node.HostName = $addr2
    $node.HostPort = $port2
    $node.Login = $user2
    $node.Password = $password2
    $node.ImagePath = $imagePath2
    $node.ImageName = $imageName2
    $node.CreateImage = $createImage2
    $node.TargetAlias = $targetAlias2
    $node.SyncInterface = $syncInterface2
    $node.HBInterface = $hbInterface2
	$node.AutoSynch = $autoSynch2
	$node.SyncSessionCount = $syncSessionCount2
	$node.ALUAOptimized = $aluaOptimized2
	$node.PoolName = $poolName2

    Add-HAPartner $device $node $selfSyncInterface $selfHbInterface
}
catch
{
	Write-Host $_ -foreground red 
}
finally
{
	$server.Disconnect()
}
However, I get the following error:
Request to GIDSHC01.INTERNAL.GULFID.COM ( 10.0.2.12 ) : 3261
-
control 0x0000020A858C0240 -AddPartner:"" -PartnerTargetName:"#p1=iqn.2008-08.com.starwindsoftware:gidshc02-partnerha22;#p2=iqn.2008-08.com.starwindsoftware:gidshc02-part
nerha22" -Priority:"#p1=2;#p2=2" -nodeType:"#p1=1;#p2=1" -PartnerIP:"#p1=172.16.20.20:sync:3260:1,172.16.10.20:heartbeat:3260:1;#p2=172.16.20.20:sync:3260:1,172.16.10.20:
heartbeat:3260:1" -AuthChapType:"#p1=none;#p2=none" -AuthChapLogin:"#p1=0b;#p2=0b" -AuthChapPassword:"#p1=0b;#p2=0b" -AuthMChapName:"#p1=0b;#p2=0b" -AuthMChapSecret:"#p1=
0b;#p2=0b" -Replicator:"#p1=0;#p2=0"
-
200 Failed: operation cannot be completed..

Node 1 - Healthy Node
IP Address = 10.0.2.12
Sync If Address = 172.16.20.10
Heartbeat Address = 172.16.10.10
Image to be recovered HAImage4 on Node 1


Node 2 - Unhealthy Node
IP Address = 10.0.2.14
Sync If Address = 172.16.20.20
Heartbeat Address = 172.16.10.20




Appreciate any guidance where I am going wrong here. Thanks
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Feb 22, 2021 10:56 am

Hi,

Make sure to specify $syncInterface2="#p1={0}:3260" -f $addr and $selfSyncInterface="#p1={0}:3260" -f $addr2.
$selfHbInterface="" and $hbInterface2="" should stay empty.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Mon Feb 22, 2021 1:28 pm

Thank you for your prompt reply. I used the same variables when I first run the script as you mentioned in your last reply (the same as in StarWindX library)
yaroslav (staff) wrote:Hi,

Make sure to specify $syncInterface2="#p1={0}:3260" -f $addr and $selfSyncInterface="#p1={0}:3260" -f $addr2.
$selfHbInterface="" and $hbInterface2="" should stay empty.

However, I still get an error when I run it using your suggestion

Code: Select all

Request to  GIDSHC01.INTERNAL.GULFID.COM ( 10.0.2.12 ) : 3261
-
control 0x0000020A858C0240 -AddPartner:"" -PartnerTargetName:"#p1=iqn.2008-08.com.starwindsoftware:gidshc02-partnerha22;#p2=iqn.2008-08.com.starwindsoftware:gidshc02-part
nerha22" -Priority:"#p1=2;#p2=2" -nodeType:"#p1=1;#p2=1" -PartnerIP:"#p1=10.0.2.14:sync:3260:1;#p2=10.0.2.14:sync:3260:1" -AuthChapType:"#p1=none;#p2=none" -AuthChapLogin
:"#p1=0b;#p2=0b" -AuthChapPassword:"#p1=0b;#p2=0b" -AuthMChapName:"#p1=0b;#p2=0b" -AuthMChapSecret:"#p1=0b;#p2=0b" -Replicator:"#p1=0;#p2=0"
-
200 Failed: invalid partner info.. 
The reason I changed the variables in my sample code is that I used these variable settings in CreateHA_2 when I originally set up the VSAN.

Any thoughts why I still get the same Fail code?
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Feb 22, 2021 3:19 pm

Hi,

Can I have your script? Please make sure that HAimage4 exists.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Tue Feb 23, 2021 5:00 am

Script attached

Code: Select all

param($addr="10.0.2.12", $port=3261, $user="root", $password="starwind", $deviceName="HAImage4",
	$addr2="10.0.2.14", $port2=$port, $user2=$user, $password2=$password,
#secondary node
	$imagePath2="My computer\D\starwind",
	$imageName2="partnerImg22",
	$createImage2=$true,
	$targetAlias2="partnerha22",
	$autoSynch2=$true,
	$poolName2="pool1",
	$syncSessionCount2=1,
	$aluaOptimized2=$true,
	$syncInterface2="#p1={0}:3260" -f $addr,
    $hbInterface2="",
    $selfSyncInterface="#p1={0}:3260" -f $addr2,
    $selfHbInterface=""
	)


Import-Module StarWindX

try
{
    Enable-SWXLog -level SW_LOG_LEVEL_DEBUG
    
    $server = New-SWServer $addr $port $user $password
    $server.Connect()

	$device = Get-Device $server -name $deviceName
	if( !$device )
	{
		Write-Host "Device not found" -foreground red
		return
	}

    $node = new-Object Node
    $node.HostName = $addr2
    $node.HostPort = $port2
    $node.Login = $user2
    $node.Password = $password2
    $node.ImagePath = $imagePath2
    $node.ImageName = $imageName2
    $node.CreateImage = $createImage2
    $node.TargetAlias = $targetAlias2
    $node.SyncInterface = $syncInterface2
    $node.HBInterface = $hbInterface2
	$node.AutoSynch = $autoSynch2
	$node.SyncSessionCount = $syncSessionCount2
	$node.ALUAOptimized = $aluaOptimized2
	$node.PoolName = $poolName2

    Add-HAPartner $device $node $selfSyncInterface $selfHbInterface
}
catch
{
	Write-Host $_ -foreground red 
}
finally
{
	$server.Disconnect()
}



When say make sure HAImage4 exists you mean just on the Healthy node? Currently HAImage4 is only on the healthy node.
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Tue Feb 23, 2021 5:33 am

Here you need to add IP addresses instead of {0}
$selfSyncInterface="#p1={0}:3260" -f $addr2,
$syncInterface2="#p1={0}:3260" -f $addr,

#secondary node
$imagePath2="My computer\D\starwind",
$imageName2="partnerImg22",
$createImage2=$true,
$targetAlias2="partnerha22",
$autoSynch2=$true,
$poolName2="pool1",
$syncSessionCount2=1,
$aluaOptimized2=$true,
$syncInterface2="#p1=172.16.20.20:3260" -f $addr,
$hbInterface2="",
$selfSyncInterface="#p1=172.16.20.10:3260" -f $addr2,
$selfHbInterface=""
)

Try this secondary node block. Make sure that HAImage4 exists on 10.0.2.12.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Tue Feb 23, 2021 10:59 am

I get the following error when making the changes

Code: Select all

Request to  GIDSHC01.INTERNAL.GULFID.COM ( 10.0.2.12 ) : 3261
-
control 0x0000020A858C0240 -AddPartner:"" -PartnerTargetName:"#p1=iqn.2008-08.com.starwindsoftware:gidshc02-partnerha22;#p2=iqn.2008-08.com.starwindsoftware:gidshc02-part
nerha22" -Priority:"#p1=2;#p2=2" -nodeType:"#p1=1;#p2=1" -PartnerIP:"#p1=172.16.20.10:sync:3260:1;#p2=172.16.20.10:sync:3260:1" -AuthChapType:"#p1=none;#p2=none" -AuthCha
pLogin:"#p1=0b;#p2=0b" -AuthChapPassword:"#p1=0b;#p2=0b" -AuthMChapName:"#p1=0b;#p2=0b" -AuthMChapSecret:"#p1=0b;#p2=0b" -Replicator:"#p1=0;#p2=0"
-
200 Failed: invalid partner info..
10.0.2.12 has HAImage 4.
Attachments
StarWind VSAN.jpg
StarWind VSAN.jpg (128.05 KiB) Viewed 6095 times
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Thu Feb 25, 2021 10:34 am

Hi,

Please make to have the folder created.

Code: Select all

param($addr="10.0.2.12", $port=3261, $user="root", $password="starwind", $deviceName="HAImage4",
 $addr2="10.0.2.14", $port2=$port, $user2=$user, $password2=$password,
#secondary node
 $imagePath2="My computer\D\starwind",
 $imageName2="imagefile%ADD_YOUR_NUMBER%",
 $createImage2=$true,
 $targetAlias2="ADD HERE YOUR TARGET NAME",
 $autoSynch2=$true,
 $poolName2="pool1",
 $syncSessionCount2=1,
 $aluaOptimized2=$true,
 $syncInterface2="#p2=172.16.20.10:3260",
    $hbInterface2="#p2=172.16.10.10:3260",
    $selfSyncInterface="#p1=172.16.20.20:3260",
    $selfHbInterface="#p1=172.16.10.20:3260"
 )
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Mon Mar 01, 2021 5:53 am

Thanks Yaroslav it worked!

I took my time to try it out because in the meantime I already started manually editing the StartWind.cfg in both nodes and *_HA.swdsk files in the failed node (node 2). I managed to recover the failed node and all its VSAN disks using this method, which helped me get a much better understanding how the StarWind VSAN can be configured. All without losing any data while node 2 was down which saved me from having to use backups.

I then tried the script you suggested. I deleted HAImage4 (deleted the .img and .swdsk files) in node 2 which caused the issue I first reported at the start of the post. I then used RemoveHAPartner and your modified AddHAPartner script above. Worked seamlessly and node2 started syncing with a new HAImage4 (with a new *.img and *.swdsk files) in node 2 fully configured as per node 1 HAImage4.

I do have a question though which came up when I manually edited the *.swdsk and *_HA.swdsk. I noticed that in node 1 (the good node) which has HAImage1.swdsk created originally (back last year using the StartWindX scripts) has a cylinder count of "5", whereas as the corresponding HAImange1_HA.swdsk has a count "65535". Similarly HAImage2 "5" and "65535". HAimage4 has "51200" and "51200". HAImage4 was created very recently in the last month. I should also note that node 2 has identical cylinder numbers for all *.swdsk and *_HA.swdsk as per the corresponding files in node 1. So what is the significance of count in *.swdsk and *_HA.swdsk files for HAImage1 and HAImage2? Do I have an issue I should be worried about?

Great product BTW :)
yaroslav (staff)
Staff
Posts: 2340
Joined: Mon Nov 18, 2019 11:11 am

Mon Mar 01, 2021 7:17 am

Do not worry about the number of cylinders. Provided that you did not resize the HA device via the config file it should be good. Just make sure that the cylinder number on both sides is equal.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Mon Mar 01, 2021 7:42 am

Got it. Thanks Yaroslav. Cylinder number on both sides is equal and I didn't do any HA device resizing.
Post Reply