Page 1 of 3

Recover HA after second node failure

Posted: Wed May 19, 2021 5:30 pm
by harish.patil
We had created 2 Node HA with vSAN Free. Due to OS error, we had to format and reinstall the secondary node. Now we want to restore the earlier HA setup, how do we do that and which script needs to be followed step by step. Can someone help in this regard? Our storage is running on the primary(healthy) node. Details are as follows:

Node 1 (Healthy)
IP: 10.10.10.6
HB Interface: 192.168.100.6
Devices:
quorum - HAImage1 - 50GB
datastor- HAImage2 - 500GB
filestor - HAImage3 - 1000GB
linuxstor - HAImage4 - 1000GB
csv - HAImage5 - 500GB

Node 2 (new)
IP: 10.10.10.8
HB Interface: 192.168.100.9


Please help!

Re: Recover HA after second node failure

Posted: Wed May 19, 2021 6:54 pm
by harish.patil
P.S. The images are still available on the new node.

Re: Recover HA after second node failure

Posted: Thu May 20, 2021 7:45 am
by yaroslav (staff)
Welcome to StarWind Forum. Are HA devices on the affected node gray? If so, please share the StarWind.cfg file from the affected node with me. Also, you can try removing the grayed-out partner HAs. On the healthy node, run RemoveHAPartner.ps1 from C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell. Once you do that, share the .cfg file with me here (unless it removes the gray HA devices).
If gray HA devices will go away, replicate the healthy HAs to the affected node with the AddHAPartner.ps1 from the same folder.

Let me know if you have additional questions.

Re: Recover HA after second node failure

Posted: Thu May 20, 2021 8:01 am
by harish.patil
Hi yaroslav,

Thank for replying.

The affected node does not have any HA devices as it is newly installed however the image files are available. We want to know how to reconfigure the new node so that it shall communicate with the existing healthy node.

Thanks

Re: Recover HA after second node failure

Posted: Thu May 20, 2021 12:11 pm
by yaroslav (staff)
Do you have an old config file from the affected node?

Re: Recover HA after second node failure

Posted: Thu May 20, 2021 12:36 pm
by harish.patil
I do not have " StarWind.cfg" file however I have ".swdsk" & "_HA.swdsk" configuration files of each device. Let me if it helps.

Re: Recover HA after second node failure

Posted: Thu May 20, 2021 4:34 pm
by yaroslav (staff)
Hi,
There are 2 ways to restore everything.
1. Long and easy. No risk. You delete the files from the underlying storage and start replicating the disks by running AddHAPartner.
2. Complicated and fast. You need to modify the config file (Share with me both StarWind.cfg's). Not sure if that works as should so please make sure to have solid backups that are not located on StarWind HAs.

Re: Recover HA after second node failure

Posted: Thu May 20, 2021 4:51 pm
by harish.patil
I have attached the config files from both the nodes, meanwhile we will try to recreate one of the image. We will also ensure that we take proper backup of all VMs and data.

Re: Recover HA after second node failure

Posted: Thu May 20, 2021 5:15 pm
by yaroslav (staff)
Hi, the affected config file has all devices there on the affected .cfg. Would you be so kind to provide the screenshot from the Management Console?

Re: Recover HA after second node failure

Posted: Thu May 20, 2021 7:09 pm
by harish.patil
Please find attached screenshot of the console.

Re: Recover HA after second node failure

Posted: Fri May 21, 2021 12:53 pm
by yaroslav (staff)
You need to remove the replica to the affected server (run RemoveHAPartner.ps1 from C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell), remove the HAs on the affected server from the underlying storage and recreate the replica with AddHAPartner.ps1.

Re: Recover HA after second node failure

Posted: Fri May 21, 2021 6:34 pm
by harish.patil
Hi,
Sorry for late reply
I have run below removehacode successfully

Code: Select all

param($addr="192.168.100.9", $port=3261, $user="root", $password="starwind", $deviceName="HAImage4", $partnerTargetName="iqn.2008-08.com.starwindsoftware:revmaxsr6.revmax.co.in-linuxstor")

#
# RemoveHAPartner.ps1
#
Import-Module StarWindX

try
{
    Enable-SWXLog

    $server = New-SWServer $addr $port $user $password
    $server.Connect()

    Remove-HAPartner $server -deviceName $deviceName -partnerTargetName $partnerTargetName
}

catch
{
    Write-Host $_ -foreground red
}

finally
{
    $server.Disconnect()
}


But facing issue in ADDHAPARTNER code is mentioned below

Code: Select all

param($addr="192.168.100.9", $port=3261, $user="root", $password="starwind", $deviceName="HAImage4",
    $addr2="192.168.100.6",$port2=$port, $user2=$user, $password2=$password,
#secondary node
    $imagePath2="My computer\C\starwind",
    $imageName2="linuxstor",
    $createImage2=$true,
    $targetAlias2="linuxstor",
    $autoSynch2=$true,
    $poolName2="pool1",
    $syncSessionCount2=1,
    $aluaOptimized2=$true,
    $syncInterface2="#p1={0}" -f $addr,
    $hbInterface2="",
    $selfSyncInterface="#p1={0}" -f $addr2,
    $selfHbInterface=""
    )
    
Import-Module StarWindX

try
{
    Enable-SWXLog -level SW_LOG_LEVEL_DEBUG
    
    $server = New-SWServer $addr $port $user $password
    $server.Connect()

    $device = Get-Device $server -name $deviceName
    if( !$device )
    {
        Write-Host "Device not found" -foreground red
        return
    }

    $node = new-Object Node
    $node.HostName = $addr2
    $node.HostPort = $port2
    $node.Login = $user2
    $node.Password = $password2
    $node.ImagePath = $imagePath2
    $node.ImageName = $imageName2
    $node.CreateImage = $createImage2
    $node.TargetAlias = $targetAlias2
    $node.SyncInterface = $syncInterface2
    $node.HBInterface = $hbInterface2
    $node.AutoSynch = $autoSynch2
    $node.SyncSessionCount = $syncSessionCount2
    $node.ALUAOptimized = $aluaOptimized2
    $node.PoolName = $poolName2

    Add-HAPartner $device $node $selfSyncInterface $selfHbInterface
}
catch
{
    Write-Host $_ -foreground red 
}
finally
{
    $server.Disconnect()
}



The above code is giving me below error
PS C:\Users\administrator.REVMAX\Desktop> C:\Users\administrator.REVMAX\Desktop\AddHaPartner.ps1
Exception calling "AddPartner" with "1" argument(s): "Request to REVMAXSR9.REVMAX.CO.IN ( 192.168.100.9 ) : 3261
-
control 0x000000FD39878A00 -AddPartner:"" -PartnerTargetName:"#p1=iqn.2008-08.com.starwindsoftware:revmaxsr6.revmax.co.in-linuxstor" -Priority:"#p1=2" -nodeType:"#p1=1" -
PartnerIP:"#p1=REVMAXSR6.REVMAX.CO.IN:1" -AuthChapType:"#p1=none" -AuthChapLogin:"#p1=0b" -AuthChapPassword:"#p1=0b" -AuthMChapName:"#p1=0b" -AuthMChapSecret:"#p1=0b" -Re
plicator:"#p1=0"
-
200 Failed: invalid partner info.. "

PS C:\Users\administrator.REVMAX\Desktop>
Also attaching screenshot of affected node console

Re: Recover HA after second node failure

Posted: Sat May 22, 2021 9:03 am
by yaroslav (staff)
Did you remove the replication partners from the underlying and StarWind Console prior to adding the replicas?

Re: Recover HA after second node failure

Posted: Sun May 23, 2021 8:05 am
by harish.patil
I have run the script"RemoveHAPartner" as per your last reply. I am using the free edition hence I cant use the management console to remove partner.

Re: Recover HA after second node failure

Posted: Sun May 23, 2021 12:33 pm
by yaroslav (staff)
Management console is still available for monitoring. Did you run the script for all devices? Were the replica remoced for the selected devices from thr Management Console?