PowerShell to recreate replica?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Thu Oct 12, 2017 9:27 pm

I have a feeling I might be missing something...

I can't find any PowerShell examples for how to recreate and resynchronize replica images, in the case of a host or device failure.

Seems like it might be something like the Replication Manager "Add Replica" function, except that the replica is already defined.

If the replica host OS is up and StarWInd is up and running, could it really be as simple as, say, recreating the target folders? Wouldn't we have to get StarWind to recreate the image file and start replicating to it?

How would that work, and is there any PowerShell example scripting for it?

-- Ken
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Oct 13, 2017 9:39 am

Ken,

For this purpose you can use the two scripts from StarWindX. Their default path is "C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell\" and particular scripts that you need are GetHASyncState.ps1 to check the synchronization status and syncHaDevice.ps1 to synchronize it.
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Fri Oct 13, 2017 9:57 pm

Thanks Boris,

As it happens, I spent some time writing an enhanced-reporting variation on the GetHASyncState.ps1 script just yesterday. But I'm a little bit confused.

GetHASyncState.ps1 is quite the script. Despite its name, it really appears to do much more than get states (which could really throw somebody who thought they were just going to see HA states). It actually contains commands:

Code: Select all

$server.ExecuteCommand( 0, "restoreHAPartnerNode", $params)
and

Code: Select all

#$server.ExecuteCommand( 0, "restoreCurrentHANode", $params)
which, I gather you are saying, will fully recreate partner images in a blank partition. Could you confirm that's what the above server commands do, please? I can't find any other documentation on them at all.

The other script, syncHaDevice.ps1, however, does seem to only trigger synchronization. I'm guessing it would actually be redundant if GetHASyncState.ps1 does its job.

-- Ken
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Tue Oct 17, 2017 11:42 am

Ken,

In fact, bot scripts do the same. They both trigger synchronization. Your concern about a partition being recreated completely blank does not have any grounds, as what you see as "restoreHAPartnerNode/restoreHACurrentNode" is the internal parameter of the service and does not mean the partition is going to be blanked. It simply triggers the synchronization process for the partner node or the local node respectively. To get the status of the HA, feel free to modify the script and remove the following lines from it:

Code: Select all

$params = new-object -ComObject StarWindX.Parameters        
$params.AppendParam("deviceID",$device.DeviceId)
$params.AppendParam("partnetTargetName",$partnerTargetName)
$server.ExecuteCommand( 0, "restoreHAPartnerNode", $params)
But, what is most important about the initial script, the comment line on its top does say what the script does. StarWindX scripts can be customized according to your needs.
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Tue Oct 17, 2017 5:38 pm

Thank you Boris, but I'm not sure we're both talking about the same thing here.

I asked about
"how to recreate and re-synchronize replica images, in the case of a host or device failure."
In such an instance, the replacement storage would be completely blank: no partitions, nothing. And I want to know what process to follow, to re-establish StarWind replication via command line, under such circumstances. I have not found StarWind documentation to cover this. Did I overlook it?

Now, I can see having to manually re-create the original host partitions, sizes, filesystem types and drive letters before StarWind replication can start. But after that, we have one partner host with blank partitions. How can you re-establish synchronization if we start with the target image files completely missing on one host?

I'm not even touching on whether or how we might need to re-configure iSCSI target connections.

You said
Your concern about a partition being recreated completely blank does not have any grounds, as what you see as "restoreHAPartnerNode/restoreHACurrentNode" is the internal parameter of the service and does not mean the partition is going to be blanked.
I wasn't concerned about the script blanking the partition. I was concerned about the script being able to handle a partition that is already blank due to external causes such as hardware failure -- which is, after all, one of the things StarWind is supposed to be able to protect us from, right?

Does this this help clarify the question?

-- Ken
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Wed Oct 18, 2017 3:48 pm

Ken,

At the moment the only option available for free customers using PowerShell scripts with no StarWind Management Console available is to migrate data from the old HA to the new one. Yet, we are working on a script that would recreate the partner replica to the existing HA device in case of partner node failure. It will be available in one of the future releases as a part of the StarWindX pack.
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Thu Oct 19, 2017 4:39 pm

Thanks Boris. Ah. OK, I get that. Hmm. Let me see if I understand what that means...

When you say "migrate data from the old HA to the new one", I take that to mean something like cloning the host storage LUN, partition, whatever, to the new replacement for the failed storage. That doesn't sound too hard. I'd probably use something like ROBOCOPY /MIR to mirror (clone) the existing storage, but other methods should work too. Of course I expect we'd have to stop StarWind services while the copy is in progress.

And I take it that, once this is done and the services restarted, we might need to use GetHASyncState.ps1 and/or syncHaDevice.ps1 to restore normal synchronization.

(I'm ignoring failures that would require reinstalling StarWind software on the failed system -- i.e., I'm assuming the host OS partition is properly backed up.)

One optimization occurs to me: I'd expect that once the StarWind sync is triggered, it would have to do a full re-synchronization, which could take quite a while and completely re-copy all the data across to the new storage again. It seems unlikely to me that the sync mechanism would support verifying the manual copy and skip the full re-sync, right?. So that means redundantly copying all of the data twice, doubling the time required to re-establish full synchronization.

There's a ROBOCOPY option that could potentially help speed things greatly, and minimize or even eliminate the amount of time the StarWind services would have to be stopped. ROBOCOPY /CREATE will create all copied files as zero-length (empty). Would the sync process re-extend the cloned files to the right length while syncing the data?

If this process works, and is documented (e.g., right here), you might not need to write that script at all!

-- Ken
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Thu Oct 19, 2017 11:44 pm

BTW, this very simple powershell script lists all status info on all local StarWind devices, no editing required.

Code: Select all

Import-Module StarWindX

$server = New-SWServer -host 127.0.0.1 -user root -password starwind -port 3261
$server.Connect()
if ( $server.Connected )
{ 
    
    foreach($device in $server.Devices)
    {
        $device
    }
    
    $server.Disconnect()
}
else
{
    "Server not connected"
}
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Oct 23, 2017 3:41 pm

Ken,

The script you have shared will work, that is true.
Under data migration I meant creating a new device on server 2 (if we assume server 2 was faulty), connecting it via iSCSI and copying all information from the old volume on server 1 to the new HA device on server 2.
As for synchronization, it is done on the binary level and creating some files in any way will not give you any impact on speeding it up.
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Tue Oct 24, 2017 11:25 am

Thank you Boris.

I do get that there's nothing we can do to make the sync itself run faster. I was looking for a way to speed up the preceding manual copy, i.e. how faithful does that copy have to be, before sync will start? I gather it's not enough to simply recreate the partition and empty folders, but what if we created empty, zero-length image files as well? That could be very fast.

Pre-copying those files completely is what would take the most time, and complete data content copies is just wasted, as the content would be immediately be over-written by the sync. So I'm just wondering if the sync would extend the image files properly while it is replicating them, or whether the sync would insist that faithful copies must exist there before it starts and overwrites them? (Robocopy can create empty zero-length files, but I don't know of any simple tool that would quickly pre-extend the empty copied files without actually doing a full content copy.)

Granted, it's a subtle point, and the straight copy is certainly simpler, if slower. If you're not sure, I'll probably just try it and report back.

-- Ken
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Tue Oct 24, 2017 1:18 pm

Ken,

Synchronization is triggered immediately when a write operation is done on the volume. As for transferring the information from one volume to another one on the same node, you can use whatever tool you prefer, as it is a simple copy-paste operation.
Feel free to report back any findings or scenarios you manage to test. We will appreciate it much.
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Wed Oct 25, 2017 5:35 pm

It doesn't feel like we're talking about the same things, Boris. Terms like "volume" and "device" seem to have multiple meanings. And I'm afraid you lost me when you said:
Under data migration I meant creating a new device on server 2 (if we assume server 2 was faulty), connecting it via iSCSI and copying all information from the old volume on server 1 to the new HA device on server 2.

It sounds like you're talking about using the StarWind management console to create a new device and replicate it from the existing one, as opposed to working at the host level. Not sure if there is PowerShell for what you describe, either.

What I had in mind was working at the host OS level to replace the failed storage and get it ready for StarWind sync to take over. My question was what had to be in place, at the host level, before the VSAN sync mechanism will accept the replaced storage as a legitimate VSAN cluster member, so that it will start syncing the image.

As I see it, here's what's needed in order to replace failed host storage (hardware or LUN) on one host.

1. The failed host storage has to be replaced and repartitioned at the host level, with nominally the same partitioning as on the failed one, which will likely be also the same as the sync partner's partitioning.

2. At this point it seems likely that the StarWind-related folder, image and metadata files in that host partition will need to be replaced, one way or another. I don't see any way -- or any point -- to try to copy individual files at the VSAN internal virtual storage level until the host-based images are recognized and synchronized by the VSAN. And the sync will look after them after that anyway.

I tried stopping the VSAN services on one node, using ROBOCOPY to copy the filesystem structure and empty files (which took only an instant) from the old partition to a new one, swapping drive letters and restarting. It didn't work at all -- sync didn't seem to even see the replaced filesystem. It would help to know what it looks for. Maybe copying the metadata files verbatim...? It might look inside the the image file itself, but that doesn't make sense to me. Maybe the image files on all/both sync partners have to have identical sizes before sync will start.

I think I'm going to try a complete partition copy from good host to failed host storage. One thing that's unclear is whether it cares which host I take the replica from. I'm also not sure whether simply stopping the VSAN service (which presumably stops the iSCSI targets) will undo the locks on the files at host level so I can copy them. Which would mean the entire cluster would have to be down during that whole, long, redundant copy.

I guess I'll find out.

-- Ken
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
wallewek
Posts: 114
Joined: Wed Sep 20, 2017 9:13 pm

Fri Oct 27, 2017 1:27 am

Well,THAT was interesting...

It looks to me that, in a (Server 2016) two-host cluster, if I stop the StarWind VSAN service on one host, it automatically stops that service in the other host as well! Am i imagining things? Kind of makes me wonder how we are supposed to proceed, if we want to keep the cluster online while taking one host offline for maintenance.

In other news, I tried another approach for optimized copying and swapping of a cluster hard drive. No joy, again. And even though even though I took the one cluster host offline and tried to stop the SW VSAN service, I wound up with a crashed cluster, including all guest VMs.

Next try will have to be a full, un-optimized raw copy of a host hard drive from the "good" host. Only question is, what do I have to do, to let me copy that drive without file access conflicts?

-- Ken
------------------------
"In theory, theory and practice are the same, but in practice they're not." -- Yogi Berra
PoSaP
Posts: 49
Joined: Mon Feb 29, 2016 10:42 am

Fri Oct 27, 2017 1:56 pm

wallewek wrote:Well,THAT was interesting...

It looks to me that, in a (Server 2016) two-host cluster, if I stop the StarWind VSAN service on one host, it automatically stops that service in the other host as well! Am i imagining things? Kind of makes me wonder how we are supposed to proceed, if we want to keep the cluster online while taking one host offline for maintenance.

In other news, I tried another approach for optimized copying and swapping of a cluster hard drive. No joy, again. And even though even though I took the one cluster host offline and tried to stop the SW VSAN service, I wound up with a crashed cluster, including all guest VMs.

Next try will have to be a full, un-optimized raw copy of a host hard drive from the "good" host. The only question is, what do I have to do, to let me copy that drive without file access conflicts?

-- Ken
Guys, sorry for interrupting your discussion, but I think you forgot about one thing, wallewek.
While copying, you have three files in StarWind folder, one *.img and *.swdsk, *.swdsk_HA, header file and headers different for both host.
Michael (staff)
Staff
Posts: 317
Joined: Thu Jul 21, 2016 10:16 am

Fri Oct 27, 2017 4:01 pm

It looks to me that, in a (Server 2016) two-host cluster, if I stop the StarWind VSAN service on one host, it automatically stops that service in the other host as well! Am i imagining things? Kind of makes me wonder how we are supposed to proceed, if we want to keep the cluster online while taking one host offline for maintenance.
Hello wallewek,
Could you please collect the logs from both node using StarWind Log Collector https://knowledgebase.starwindsoftware. ... collector/ and log a support case here: https://www.starwindsoftware.com/support-form ?

As for the file copying, PoSap is correct - you have to edit all StarWind configuration files (.swdsk) and StarWind config to restore HA manually. StarWind VSAN will do a Full synchronization in any case.
Post Reply