Asynchronous failover operations

ymsalem · Tue Feb 24, 2015 3:30 pm

Hi,
Trying to understand how Async replication will work when failover to a remote server - Please advise with any details available or documentation ( I could not find it myself).

Testing now with two Win2012R2 VMs on two subnets (to simulate local & Remote). Created target and device on local server, enabled Async replication to remote server device. All working ok and synchronization was complete and I can see it working when writing data on local server.

However, using iSCSI initiator on a remote server, I tried to attach the remote device/LUN but I could not connect to the remote target - is this normal behavior?

Now, to simulate failure .. e.g. shutdown this local server, How I can resume operations using the device/LUN on the remote server? Also, Is there a way to automate this failover?

Thanks for your help!

Fri Feb 27, 2015 2:12 pm

It is not actually normal. The DR copy should be accesable in read only mode.

Is there any chance you could provide us with following:
· StarWind service logs at the time of the issue from all problematic SAN boxes
· Windows Application and System logs (in *.csv format) on the time of the issue from all problematic SAN boxes
· Detailed network diagram of SAN system
· Description of the actions that were performed before/at the time of the issue
· Approximate time frames when the issue happened

I`d appreciate if you`ll separate the logs from different servers into the different folders

Please upload them to some file hosting and share the link with us?

Thank you

ymsalem · Thu Mar 19, 2015 3:33 pm

Thank you for the reply.
Very sorry for the late reply .. I didn't know someone has replied and somehow did not get notification from the forum.

Anyways .. ever since, I created a new testing lab on HyperV host with all my testing VMs.
1) I'm now using two machines on two different subnets to simulate PROD & DR.
2) Installed Starwind and created one LUN with Synchronous replication.
3) Then, created a Windows 2012R2 cluster on the two machines.
4) Setup SoFS role and created one share on that LUN.
5) On another two machines, created SQL clustered instance and used that share for that.

Operations is good and failover is working ok.
1) two PROD nodes of SoFS and SQL cluster go down -- to simulate disaster
2) Starwind throwing notification Node1 connection has been lost and LUN becomes in Un-Synchronized state.
2) SoFS failover to DR and LUN becomes available
3) SQL starts ok on the DR node and access to replicated LUN is ok.

Failback was also operating OK.
1) Two PROD nodes of SoFS and SQL come back online
2) Starwind connecting again to PROD node and LUN becomes Synchronized.
3) I was able to failback SoFS to the PROD node
4) Also, SQL can now failback to the PROD nodes.

Now, I will try to repeat the above but with some load and high IO to examine the operations under some pressure.

My questions ..
1) Is the above a supported and reliable setup with Starwind? I reviewd many documentation, papers and posts but did not see this exact and with all these details about replication failover and failback.
2) Is there a way to create Synchronous replication of a whole drive .. i.e. do I have still to create a virtual disk on the drive inside Starwind to use the replication?
3) If I move this same setup to a real production with physical machines, network and storage .. will there be any limitations .. i.e. number of LUNs to replicat? Volume of replication of all LUNs? .. etc.

As for my original question about the Async replication, I also repeated this on this new lab. Only when node1 comes down, I was able to connect to the iSCSI target on the DR node .. but still, in DiskMgmt, I can see the disk but in offline state and cant bring it online. Please advise.

Thank you .. apology again for a late reply and very long message.

Regards,,

Fri Mar 20, 2015 1:50 pm

1. I appreciate some diagram before I confirm its.
2. Yes, HA requires .img files. HA can not be used with exported disks.
3. We have no limitations on the number of targets, but only one LUN per target if you plan to use HA. Also we have no limitations on the size of a LUN, but Console allows you to create only 16 TB LUN, which can be extended then. Upcoming release will be deprived size limitation.

Regarding how to mount and use DR:
DR copy can be used in read-only mode and should be mounted via StarWind Console beforehand. Primary copy should be at least offline in Disk Management Console.

Please feel free to ask me questions.

ymsalem · Fri Mar 20, 2015 5:17 pm

1) Diagram attached - let me know if you need any more details. The only difference between this diagram and my current testing, I'm just using two nodes Windows cluster instead of four nodes at each of Primary and DR sites (Simulated by virtual two different subnets).
https://drive.google.com/file/d/0ByDsUJ ... sp=sharing

2) OK - thanks.
3) OK thanks, but just to confirm I understand this properly .. If I'm using HA luns, I will have to have as much targets as the number of LUNs - Correct??

For #2 & #3 above, is there other way in Starwind to achieve the same goals - aside from the HA luns.

For you comments DR mounting - are you referring to Async or in general?
So, what happened if Primary still down - that is the whole idea from having the DR .. isn't it?

Thanks for your feedback and support

Tue Mar 24, 2015 6:41 pm

1. Ok, if you plan to use 4 servers per data center, then I can confirm, that it is supported configuration. But please note StarWind allows you create 2 or 3-node HA. For example you have 3 TB per host and plan to use 3-node HAs 1 TB each, then the plan is:

1st HA between nodes 1, 2 and 3

2nd - 2, 3 and 4

3rd - 3, 4 and 1

4th - 4, 1 and 2

Now regarding replication between data centers: It is possible and HA should be configured like described above, but it requires really good bandwidth between them, otherwise connection between data centers will be a bottleneck of whole configuration and performance will be poor.

3. You're right.

Could you please clarify this

ymsalem wrote: For #2 & #3 above, is there other way in Starwind to achieve the same goals - aside from the HA luns.

Initially I thought that you mean Async replication, but from your diagram it looks like fully functional 8 node cluster. Please follow same fashion as above. In this case you have active-active partners and no DR.

It might be helpful https://www.starwindsoftware.com/techni ... _nodes.pdf

ymsalem · Wed Mar 25, 2015 2:19 pm

Alright .. thank you for all replies and support.

Regards,,

Fri Mar 27, 2015 10:43 am

Good to hear that!
Than kyou and have a great weekend!