Recovery after disk failure

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: art (staff), anton (staff), Anatoly (staff), Max (staff)

jdeshin
Posts: 63
Joined: Tue Sep 08, 2020 11:34 am

Tue Sep 08, 2020 11:44 am

Hello all.
Recently I have downloaded StarWind vSan free software for tests and have created 2 node scale-out file cluster. After removing disk from one of nodes and replace it to clear disk to emulate disk failure I have received attention that a disk is not sycronized that is right. I cannot found any cmdlets or any info about restoration StarWind cluster after disk failure. How can I restore StarWind functionality after disk failure?
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Tue Sep 08, 2020 3:39 pm

Welcome to StarWind Forum.
So, you have removed the disk on the partner side, did you delete it from the underlying?
If you did, please re-create replica with AddHaPartner from C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell.
jdeshin
Posts: 63
Joined: Tue Sep 08, 2020 11:34 am

Wed Sep 09, 2020 2:18 pm

Do I need delete existing targets and devices that appears on node with disk failure before running powershell script?
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Wed Sep 09, 2020 2:34 pm

No.
Could you provide me with the screenshot from the affected node, please?
Please kindly tell me what exactly you did with the disk. Did you delete it from the underlying storage or in StarWind?

Please, share the screenshot with me.
jdeshin
Posts: 63
Joined: Tue Sep 08, 2020 11:34 am

Thu Sep 10, 2020 6:28 am

Hello Yaroslav.
I have made my experiment on hyper-v virtual machines. Each virtual machine has two disks, one for system and another for starwind data.
I have installed and configured StarWind vSan free and then scale-out two node file server.
Then I have deleted data disk from one of nodes. The file cluster continued works perfectly! StarWind Software is really cool!
Then I have created a new vhdx disk and atthached it instead of "lost" disk.
The screenshot from StarWind console in attached file:
failure.png
failure.png (20.03 KiB) Viewed 7712 times
Should I delete devices and targets by powershell before running a script that you suggested me?
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Thu Sep 10, 2020 6:57 am

Hey,

So, you have deleted the disk from the storage, and created a new, blank one. Am I correct? Did you delete it on SW1?
If so, just restart the service on SW1 or try running SyncHADeviceAdvanced.ps1 from C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell for HAImage2.
If it does not work, delete the target [for the HA device that was deleted before], delete the image file from the underlying storage, and recreate the replica of HA device that "survived" with AddHaPartner.ps1 from StarWindX folder.

Let me know if that works.
jdeshin
Posts: 63
Joined: Tue Sep 08, 2020 11:34 am

Thu Sep 10, 2020 7:20 am

Yes, I have deleted data disk on sw1.
OK. I will try to implement your algorithm and will inform you about results.

P.S.
Another way that works is backing up swdsk files on each node and restoring it on node that failed. Then creating image with the same parameters and file name.
But I think that it is not the best way :)
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Thu Sep 10, 2020 7:53 am

That's good idea for a lab setup to learn how StarWind VSAN works.
BUT do not do that once you have production data on img. It is better just to recreate the replica.
jdeshin
Posts: 63
Joined: Tue Sep 08, 2020 11:34 am

Thu Sep 10, 2020 11:53 am

After restarting StarWind service I have got following result:
afterrestart.png
afterrestart.png (14.81 KiB) Viewed 7701 times
running SyncHADeviceAdvanced.ps1 on "lost" node sw1 has no result, because Get-Device -server $server -type [StarWindDeviceTypeStr]::HAIMAGE retyrns no results
running this script on sw2 node has no results too, because this node is in sync state.
Do I need delete all devices and targets from sw1 node? If yes, is data about ha partner will deleted from node sw2?
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Thu Sep 10, 2020 12:24 pm

Greetings,

You need to remove targets on SW1. No data will be removed from SW2.
Go to SW2 and start replication to SW1.
jdeshin
Posts: 63
Joined: Tue Sep 08, 2020 11:34 am

Thu Sep 10, 2020 1:29 pm

I cannot find any cmdlet for enumerating targets in starwindx powershell module.
How can I enumerate targets for deleting?
Should I use standard get-iscsitarget cmdlet and manually select target names for deletion?
Should I use remove-target cmdlet for removing targets?
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Thu Sep 10, 2020 1:57 pm

Greetings,

You can remove them in StarWind.cfg file. Stop StarWindService, make a copy of StarWind.cfg (C:\Program Files\StarWind Software\StarWind), and edit StarWind.cfg
Find the part that looks like this

Code: Select all

<device file="My Computer\S\Witness\Witness.swdsk" node="0" name="imagefile1"/>
    <device file="My Computer\S\CSV1\CSV1.swdsk" node="0" name="imagefile2"/>
    <device file="My Computer\S\CSV2\CSV2.swdsk" node="0" name="imagefile3"/>
    <device name="HAImage1" OwnTargetName="iqn.2008-08.com.starwindsoftware:sw-hca-01-witness" file="My Computer\S\Witness\Witness_HA.swdsk" serialId="5C09943DFF17FA16" header="65536" reservation="no"/>
    <device name="HAImage2" OwnTargetName="iqn.2008-08.com.starwindsoftware:sw-hca-01-csv1" file="My Computer\S\CSV1\CSV1_HA.swdsk" serialId="9DD1FC07B8D8696C" header="65536" reservation="no"/>
    <device name="HAImage3" OwnTargetName="iqn.2008-08.com.starwindsoftware:sw-hca-01-csv2" file="My Computer\S\CSV2\CSV2_HA.swdsk" serialId="DA85CD20E051248B" header="65536" reservation="no"/>
  </devices>
  <targets>
    <!--<target name="targetname" alias="my target" devices="ImageFile0,ImageFile1"/>-->
    <!--<target name="targetname" alias="my target" devices="ImageFile0,ImageFile1" XcopyMode="3"/>-->
    <!--Target XcopyMode parameter value: 0 - none, 1 - VAAI, 2 - ODX, 3 - ODX+VAAI, 4 - use global options
        By default, this value = 4 that means we choose mode from 0 to 3 depends on VaaiExCopyEnabled and OdxEnabled global options.-->
    <target name="iqn.2008-08.com.starwindsoftware:sw-hca-01-witness" devices="HAImage1" alias="Witness" clustered="Yes" node="1"/>
    <target name="iqn.2008-08.com.starwindsoftware:sw-hca-01-csv1" devices="HAImage2" alias="CSV1" clustered="Yes" node="1"/>
    <target name="iqn.2008-08.com.starwindsoftware:sw-hca-01-csv2" devices="HAImage3" alias="CSV2" clustered="Yes" node="1"/>
  </targets>
  <permissions>
Remove each <device> and <target> entry or comment them in the following fashion

Code: Select all

<!--<device name="HAImage3" OwnTargetName="iqn.2008-08.com.starwindsoftware:sw-hca-01-csv2" file="My Computer\S\CSV2\CSV2_HA.swdsk" serialId="DA85CD20E051248B" header="65536" reservation="no"/>-->
Alternatively, you can just reinstall StarWind VSAN on SW1.

Let me know it that helps.
jdeshin
Posts: 63
Joined: Tue Sep 08, 2020 11:34 am

Mon Sep 14, 2020 6:52 am

Hello Yaroslav.
I have made steps according your messaged and have following results:
After deletion devices and targets on "failed node":
Безымянный1.png
Безымянный1.png (5.49 KiB) Viewed 7602 times
Results of running script:
Безымянный.png
Безымянный.png (100.88 KiB) Viewed 7602 times
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Mon Sep 14, 2020 7:31 am

Hi,

Can I have the script here please?
jdeshin
Posts: 63
Joined: Tue Sep 08, 2020 11:34 am

Mon Sep 14, 2020 8:07 am

The script in attachment
Attachments
AddHaPartner.zip
(832 Bytes) Downloaded 290 times
Post Reply