RAID device issue - Rebuild starwind config?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: art (staff), anton (staff), Anatoly (staff), Max (staff)

Post Reply
BenM
Posts: 35
Joined: Wed Oct 24, 2018 7:17 am

Fri Jun 28, 2019 9:10 am

Guys,

I appreciate this may, in fact, require a support ticket however I would like to run my situatiuon past you:

we have a 2 node StarwindFree vSAN based cluster of two Dell R620 boxes with LSI PERC 710i RAID controllers.
One of the nodes has flagged a drive as 'Predicted fail' - to fix this issue one has to delete the raid config from the controller and start again with the phy drives in different RAID slots.
The system disk for the affected node does not share any phyiscal devices with the data volume.
I appreciate that destroying the RAID volume will break Starwind - what was a nice drive full of synched data will empty.

So i Have had a think and come up with a plan of action. Bear in mind I am well out of the 30 days where the Starwind GUI will change the config.
  • Pause the affected node, draining the roles
  • Shut down VMs (most of them, DCs would be problematic)
  • disable starwind on the affected node once everything is synched
  • copy the starwind data off the data volume on to local removable storage (yes, It will take forever)
  • shutdown paused node
  • Perform RAID magic to delete the volume and recreate it with the phy disks in new slots replacing as necessary to remove any hard read faults
  • reboot node
  • format drive in Windows
  • copy data back from local removable storage
  • enable Starwind but don't start the sevice
  • reboot - should allow SW to come back and iSCSI will hopefully recover
  • Watch Starwind resynch as necessary and enjoy another period of trouble free cluster
Is there anything I have missed - or should I just log a support call and get them to step me through everything?

Thanks for your time

Ben
Serhi
Posts: 21
Joined: Mon Mar 25, 2019 4:01 pm

Fri Jun 28, 2019 9:40 am

Hi

Don't forget to save starwind.cfg file

After installing StarWind stop the service, replace starwind.cfg and start the service again.

BR
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Fri Jun 28, 2019 10:43 am

Hi Ben,
Shut down VMs (most of them, DCs would be problematic
You will do a rebuild of one node. It means that VMs can run on another, synchronized node, if you have enough compute resources.
format drive in Windows
And assign the same letter.
BenM
Posts: 35
Joined: Wed Oct 24, 2018 7:17 am

Wed Jul 17, 2019 10:53 am

First off - apologies for the delayed response - new baby born shortly after I posted the message and then I spent an interesting week camping out in Hospitals owing to critically ill baby... all sorted now. Ypiee.

Shutdown the VMS - that step was used to minimise writes to the CSV to speed up the resych after I bring the node back on line. The cluster is easily capable of running all the VMs on one node. As people won;t be using the system (much) it is tempting just to leave everything running and allow the synch to run and run after I bring the disk back on line.

Fornat the new drive and yes, definitely give it the same drive letter.

Do I need to save the Starwind.cfg file because I am not affecting the system disk of the cluster node.... unless it is rebuilt from somewhere on boot, I wouldn't have thought it would change.

I plan to start the fix next week or early the week after.

Thanks for your input.
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Wed Jul 17, 2019 12:03 pm

Good news that everything is sorted now.
Do I need to save the Starwind.cfg file because I am not affecting the system disk of the cluster node.... unless it is rebuilt from somewhere on boot, I wouldn't have thought it would change.
That is more related to the cases when you need to rebuild OS drive as well.
BenM
Posts: 35
Joined: Wed Oct 24, 2018 7:17 am

Tue Aug 20, 2019 8:06 am

For completeness - I have just finished the rebuild of the RAID - it has worked as expected and was completely painless from the vSAN point of view.

The Starwind Software performed without a hitch and, as I type this, is completing the re-synch (43 Minutes with a 1Gb network connection to the other node)

Lessons learned - don't use local USB storage if you want a speedy save and restore; took just shy of 3 days to copy 4.91Tb in each direction :)

Thanks again for your help.

Ben
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Tue Aug 20, 2019 9:03 am

You are welcome :)
Post Reply