Starwind HA Cluster + HyperV

starneo · Fri Jan 28, 2011 10:47 am

Hi,

I want to set up (my first) Starwind HA Cluster with HyperV.
I have 2 Storages, each with 2 Raids consiting of 12x600GB SAS (15Krpm), overall 24 disks.
I have created 6 Targets in write-back mode, but I do not know if this is a good idea.
What happens when one Node fails? If I understand it right, the data in write-back Cache will not fully synchronised (5000ms expiry).
This will cause data loss, right?
So my question is, is writ-through better in performance/stability (I think VMs do more reading operation than writing?) in a Starwind HA set with a HyperV Cluster?

thanks for answers.

Fri Jan 28, 2011 8:38 pm

Write-Thru makes sense for single-node only. And if you're not Tier2 storage (say backup process CAN fail in theory, no problem to re-run). For HA configuration stick with Write-Back mode. If one node will fail StarWind will flush it very fast and start working in non-cached mode to ensure all transactions are written to the disk before they ACK-ed as "OK" to the writer.

starneo wrote:Hi,

I want to set up (my first) Starwind HA Cluster with HyperV.
I have 2 Storages, each with 2 Raids consiting of 12x600GB SAS (15Krpm), overall 24 disks.
I have created 6 Targets in write-back mode, but I do not know if this is a good idea.
What happens when one Node fails? If I understand it right, the data in write-back Cache will not fully synchronised (5000ms expiry).
This will cause data loss, right?
So my question is, is writ-through better in performance/stability (I think VMs do more reading operation than writing?) in a Starwind HA set with a HyperV Cluster?

thanks for answers.

starneo · Mon Jan 31, 2011 8:17 am

Thanls for the answer, exectly what I want to know

Mon Jan 31, 2011 9:23 am

Good. Please keep us updated about your progress. Thanks!

starneo wrote:Thanls for the answer, exectly what I want to know

starneo · Mon Jan 31, 2011 10:30 am

Hi,

I now have a new problem.
Over the weekend I have shut down both storages, because they are in a simple office room for the time of installation.
So they just produce heat and its quite unsafe to let them run ...
Today I started the first storage, wait until Starwind Service came up and started the second one.

The first problem I encountered was that all 6 Target where out of sync on both storages.
No problem till now, because there is no data on it. I started a full sync from node A to B (all 6 Targets). The Storages are still syncing (10TB) via 10GBE.

Here are my questions/problems:
Today I want to set up the HyperV Cluster itself, so I thought, no problem, let the storages sync ... But I am not able to connect from my HyperV nodes to the targets that are still syncing! The iSCSI initiator say "the service is not available".
I hope I am doing something wrong, but not being able to use the cluster only because of a full sync is not good for business ...

There I come to the second question, is there a gracefull shutdown available for Starwind? The reason is simple, it is a situation like above, I want to gracefull shut down Storage B, so Storage A knows that it is alone now. Than I want to shutdown Storage A. When starting Storage A, it should now know that it is still alone but don't say me that all targets on it are out of sync (primary and partner targets). Thats the situation I do have now. This situation might also come when doing some maintainance which needs every server to be shut down.

There I come to my 3rd question, what happens if my UPCs fails and both storages crash at the same time?

Mon Jan 31, 2011 10:48 am

1) Look... The whole idea of having a storage cluster is to feed storage to your clients w/o downtime. If you're intended to switch both nodes off... Just use single node. Or you'll have to synchronize content all the time. And this is ABSOLUTELY NORMAL.

2) Just put down one node, apply service time to it, bring it back, sync, put second node down and so on. This is right way to do what you want.

3) If both nodes are powered off... I guess you don't have any on-line storage nodes left. We'll represent triple-node (and extra node to handle async replication) with V6. So for true mission critical environments customers are recommended to go this way. But I don't think it's your case. As you put down all storage nodes b/c they generate heat.

starneo wrote:Hi,

I now have a new problem.
Over the weekend I have shut down both storages, because they are in a simple office room for the time of installation.
So they just produce heat and its quite unsafe to let them run ...
Today I started the first storage, wait until Starwind Service came up and started the second one.

The first problem I encountered was that all 6 Target where out of sync on both storages.
No problem till now, because there is no data on it. I started a full sync from node A to B (all 6 Targets). The Storages are still syncing (10TB) via 10GBE.

Here are my questions/problems:
Today I want to set up the HyperV Cluster itself, so I thought, no problem, let the storages sync ... But I am not able to connect from my HyperV nodes to the targets that are still syncing! The iSCSI initiator say "the service is not available".
I hope I am doing something wrong, but not being able to use the cluster only because of a full sync is not good for business ...

There I come to the second question, is there a gracefull shutdown available for Starwind? The reason is simple, it is a situation like above, I want to gracefull shut down Storage B, so Storage A knows that it is alone now. Than I want to shutdown Storage A. When starting Storage A, it should now know that it is still alone but don't say me that all targets on it are out of sync (primary and partner targets). Thats the situation I do have now. This situation might also come when doing some maintainance which needs every server to be shut down.

There I come to my 3rd question, what happens if my UPCs fails and both storages crash at the same time?

starneo · Mon Jan 31, 2011 11:05 am

thanks for the fast answer - I unterstood that

Mon Jan 31, 2011 11:08 am

Good

starneo wrote:thanks for the fast answer - I unterstood that

starneo · Wed Feb 02, 2011 12:14 pm

I have now esablished the full HyperV Cluster. With it I am doing some testing.
I copied som larger files from a HyperV node lokal disks to a network share on the starwind Storage.
I saw the RAM load grow higher and higher, as the copy job finished the RAM load droped to normal.
I know this has nothing to do with Starwind, but maybe someone could help me.
The problem is, the RAM load will not stop until it reached 100%, and at this point the Storages performance drops extremely.
So I do not want to face some issues with that when the system goes live.
This also happens when copying ISO files from the SCVMM Server to a new VM (normaly I share the ISOs but, you never know :/ )
This whole thing looks like caching or something ...
In "normal" use it does not happen, creating VM, working with it and so on.
(and no I mean not the Starwind Cache

)

hopefully some has a idea

Wed Feb 02, 2011 2:32 pm

What you're talking about is Windows Cache & Memory Manager basics. In a nutshell: it's the way Windows work: keeping as much in cache as possible and delay all possible writes to Lazy Writer background thread. I don't think there's any way to "fix" it as it's not actually broken. There are some very minor tricks you can do but they are not going to change behavior in general.

starneo wrote:I have now esablished the full HyperV Cluster. With it I am doing some testing.
I copied som larger files from a HyperV node lokal disks to a network share on the starwind Storage.
I saw the RAM load grow higher and higher, as the copy job finished the RAM load droped to normal.
I know this has nothing to do with Starwind, but maybe someone could help me.
The problem is, the RAM load will not stop until it reached 100%, and at this point the Storages performance drops extremely.
So I do not want to face some issues with that when the system goes live.
This also happens when copying ISO files from the SCVMM Server to a new VM (normaly I share the ISOs but, you never know :/ )
This whole thing looks like caching or something ...
In "normal" use it does not happen, creating VM, working with it and so on.
(and no I mean not the Starwind Cache )

hopefully some has a idea

starneo · Mon Feb 07, 2011 4:55 pm

During my testing I encountered a little problem.
When shutting down Starwind node B, there are no problems. Just start it fast sync, all fine. Starwind node A is good.
But when shutting down Starwind node A, a few secounds later node B crashes with a BSOD (0x000000a).
This happens two times now, I do some more testing with that.

Does anyone has an idea on that issue?

Mon Feb 07, 2011 5:11 pm

That's quite interesting!
Can you drop the dump to the support@starwindsoftware.com mailbox?
I've never seen this issue before

Mon Feb 07, 2011 8:48 pm

StarWind is User-Mode application and should not put machine into BSOD. In theory... On practice there is bunch of broken drivers between us and network hardware. So I would suspect NIC driver then software firewall then antivirus software. So please 1) update NIC drivers to the most recent ones 2) uninstall (at least for now) all software firewall and antivirus software. Just to see the difference. If it's not going to help do what Max suggested - grab and zip both StarWind logs and system crash dumps (make sure you've enabled full kernel crash dumping, Google to see how to do it) and send them to support so we could find out what's broken on your system. Thanks!

starneo · Tue Feb 08, 2011 8:58 am

Thanks for the input.
First here is my setup in detail:
-no AV Program installed (no connection to internet)
-no software firewall axcept for the windows integrated
-drivers for alle NIC are the latest

Starwind config:
-6 targets over all
-3 targets primary on node A
-3 targets secondary on node A
-3 targets primary on node B
-3 targets secondary on node B

After doing some testing I reproduced the issue, with one difference:

Both nodes were online, I started the HyperV Cluster (3servers till now) and started 10 VMs on it. In those VMs I started some benchmarks to create some disk load.
After this both Starwind nodes had a RAM load of arround 18GB. I shut the complete HyperV Cluster down, then Starwind node A. Starwind node B did not have any problems.
So I stared Starwind node A and a the full HyperV Cluster. I saw the Starwind service starting on node A, the HyperV cluster got online and I saw the problem again. On my management server, where the starwind console is installed (connected remotely), the console crashed!
On Starwind node A the service was still starting, Starwind Node B has a RAM load of arround 18GB, a few secounds later it dropted to arround 1,8GB and this time there was no BSOD but the starwind service has stoped!
The HA set is now out of sync.

might this be becouse of my configuration of my targets? I did this because I want some kind of active active when doing I/O reads.

Tue Feb 08, 2011 10:24 am

You run absolutely normal scenario. So please enable kernel crash dumping and send everything you've managed to get (StarWind own logs, kernel dumps and management console dumps) zipped to support@starwindsoftware.com so we could take a look. Thank you very much for cooperation!