Recover HA after second node failure

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

harish.patil
Posts: 23
Joined: Sun Oct 27, 2019 8:31 am

Sun May 23, 2021 4:16 pm

Yes replica was removed for selected device only...
harish.patil
Posts: 23
Joined: Sun Oct 27, 2019 8:31 am

Mon May 24, 2021 6:02 am

We have tried to run the script for one device which is currently not in use and we have used the script to remove the replica for the selected device.
yaroslav (staff)
Staff
Posts: 2359
Joined: Mon Nov 18, 2019 11:11 am

Mon May 24, 2021 6:21 am

Please remove all the devices from the affected server and redo the replication for all devices. Make sure to run the script from the healthy side.
harish.patil
Posts: 23
Joined: Sun Oct 27, 2019 8:31 am

Mon May 24, 2021 7:35 am

Please let me know which script do I have to use from the healthy node.
yaroslav (staff)
Staff
Posts: 2359
Joined: Mon Nov 18, 2019 11:11 am

Mon May 24, 2021 7:50 am

You need to remove the replica to the affected server (run RemoveHAPartner.ps1 from C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell on the healthy server), remove the HAs on the affected server from the underlying storage and recreate the replica with AddHAPartner.ps1.
harish.patil
Posts: 23
Joined: Sun Oct 27, 2019 8:31 am

Mon May 24, 2021 7:56 am

One last question,

When you say, "remove the HAs on the affected server from the underlying storage" that means I need to remove the "img" files of each devices, correct?
yaroslav (staff)
Staff
Posts: 2359
Joined: Mon Nov 18, 2019 11:11 am

Mon May 24, 2021 8:05 am

Exactly, delete .img, .swdsk, and _HA.swdsk. Try the procedure for some test device first.
Let me know if you have more questions.
harish.patil
Posts: 23
Joined: Sun Oct 27, 2019 8:31 am

Mon May 24, 2021 2:12 pm

I have remove the HA device from the healthy node by running RemoveHAPartner.ps1 script, now do I need to run AddHAPartner.ps1 script from same healthy node or from the affected node?
harish.patil
Posts: 23
Joined: Sun Oct 27, 2019 8:31 am

Mon May 24, 2021 2:22 pm

Also I need to know referencing about below field sets. In our case, our HA image name on healthy node is "linuxstor" with 977GB storage and device name is HAImage4. We are little confused with targetAlias2 & poolName2. What should we mention in these two fields. Or can you share the sample script based on our scenario.

Node 1(Healthy)
SyncInterfaceIP: 10.10.10.6
HBInterface: 192.168.100.6

Node 1(Healthy)
SyncInterfaceIP: 10.10.10.8
HBInterface: 192.168.100.9

Device name: HAImage4
HA Image name: linuxstor

Thanks
yaroslav (staff)
Staff
Posts: 2359
Joined: Mon Nov 18, 2019 11:11 am

Mon May 24, 2021 2:39 pm

When the removal script is triggered, it removes targets from the partner host. This means that you need to run the script on the healthy node to remove HA devices from the partner node.
Once you remove the HA devices from the Management console delete those from the underlying storage.
Once done, run AddHAPartner on the healthy node.

Adding HA partner is similar to the script here https://forums.starwindsoftware.com/vie ... p+3#p31505. Make sure to add Management NIC as a heartbeat. It is a must to have at least two physical network cards https://www.starwindsoftware.com/system-requirements.

Let me know if you have more questions.
harish.patil
Posts: 23
Joined: Sun Oct 27, 2019 8:31 am

Tue May 25, 2021 6:10 am

I have removed the HA and underlined storage from the affected node, now I am trying to run AddHAPartner.ps1 to add the HA device but getting below error. Please help.

Exception calling "AddPartner" with "1" argument(s): "Request to xxxxxx.xxxxx ( 10.10.10.6 ) : 3261
-
control 0x000000C97F44A8C0 -AddPartner:"" -PartnerTargetName:"#p1=iqn.2008-08.com.starwindsoftware:revmaxsr9-linuxstor" -Priority:"#p1
=1" -nodeType:"#p1=1" -PartnerIP:"#p1=192.168.100.9:sync:3260:1" -AuthChapType:"#p1=none" -AuthChapLogin:"#p1=0b" -AuthChapPassword:"#
p1=0b" -AuthMChapName:"#p1=0b" -AuthMChapSecret:"#p1=0b" -Replicator:"#p1=0"
-
200 Failed: invalid partner info.. "
yaroslav (staff)
Staff
Posts: 2359
Joined: Mon Nov 18, 2019 11:11 am

Tue May 25, 2021 6:38 am

Did you Remove the replicas to the unhealthy side by RemoveHAPartner? Did you confirm the device being removed from the console?
harish.patil
Posts: 23
Joined: Sun Oct 27, 2019 8:31 am

Tue May 25, 2021 6:59 am

Yes, I have removed the replicas from unhealthy partner by RemoveHAPartner and the device is not showing under management console of unhealthy node.
harish.patil
Posts: 23
Joined: Sun Oct 27, 2019 8:31 am

Tue May 25, 2021 7:29 am

Also I have noticed that when I run RemoveHAPartner script on the healthy node, it does remove partner from healthy node however it does not trigger the removal on the unhealthy node. I need to manually remove the device from unhealthy node using RemoveDevice and RemoveTarget scripts. Could that be a reason of failure?
yaroslav (staff)
Staff
Posts: 2359
Joined: Mon Nov 18, 2019 11:11 am

Tue May 25, 2021 8:10 am

No. Did you delete the img, HA.swdsk, and swdsk from the unhealthy node?
Also, share the script you are using, please.
Post Reply