Raid 0 with Starwind HA?

nbarsotti · Fri May 06, 2011 3:35 pm

Hello,
I am wondering if anyone here is running their disks in RAID0 while using Starwind HA. I know everyone has their own level of risk aversion, but if I'm wondering there is some "gotacha" I'm missing. Please let me know what you think about this setup and how risky it is?

2 x Starwind server connected to a APC UPS system
Starwind HA configured
Starwind cache (write-back or write-through?)
RAID 0 disk array

From my understanding this type of configuration could sustain and either a single drive failure or a single server failure (a single drive failure basically creates a server failure), but not both.
When using Starwind HA what happens to a write operation that is written to a servers write-back cache, and then the server fails before its written to disk? Is that even allowed with HA? What does the Starwind service do when it can no longer access its drive? Thank you.

microfoundry · Fri May 06, 2011 4:58 pm

You beat me to the punch... I have a new infrastructure going together and I was wondering the same thing.

Aitor_Ibarra · Fri May 06, 2011 6:16 pm

Simplistic risk calulation... I am not very good at maths so please someone jump in and correct this if I make a mistake.

RAID0 probability of failure = number of drives x probability of single drive failure

Then Starwind HA is effectively RAID1 which reduces your risk. Effectively you have RAID 0 + 1 which is going to be a little more risky than a two drive RAID 1, but no where near as risky as a pure RAID 0.

So... let's say you go mad with cheap 2TB SATA drives, let's assume a 5% annual failure rate, and stick 8 of them into a RAID 0. The probability of that RAID going down within a year is 40%.

A single drive fails - there's a 40% chance it could happen in a year. That means 0.1% chance of it happening on any given day. Or, roughly 5.69% that it could happen on a Saturday, sometime over a year. The chance that a drive could fail in the other RAID0 before Monday, when someone can replace the failed drive, is 2 x 0.1% = 0.2%. Multiply the two together and you have 1.13% chance that in a year, you will get two drive failures over a weekend, hosing all your data.

If a volume dies, then the HA volume on that Starwind node will die too, but be ok on the other node. Starwind itself will stay running (unless Windows/Starwind are running off same volume!). You will have to recreate volume from scratch, format it, shut your working HA partner down gracefully, shut down all your iSCSI clients, bring back working partner, delete the target (but not the img), copy the img across to the other server, and recreate the HA target in Starwind, choosing not to sync.

As you went for 8 x 2TB drives, and you've got a 10G connection (let's assume) between servers, and it's RAID0, each drive can do 100MB/sec, so you should be able to max out that 10G connection and get that img copied across in about 6 hours. That's pretty much a day's downtime. What will that cost you, and is it worth the roughly 40% chance that it will happen once in a year?

If you go for any form of rebuildable RAID 1,5,10,6 - with a hotspare - not only do you reduce risk, but you don't have to worry so much about someone swapping the drive... and it would take at least two drive failures on the same array before you have to worry about a full resync.

If Starwind goes down but your RAIDs are OK? Depends on how long you are down, if it's within the fast sync window then you will be up again with no downtime. I would say that this is far less risky to your data than RAID0, but the probability of it happening is pretty much 100% (at least for planned downtime) because at some point you will need to update Windows and reboot.

Anyway, it all hinges on the number of drives and their reliability. Oh, also, with those large capacity drives, silent data corruption is a big risk too, which increases with capacity.

nbarsotti · Fri May 06, 2011 6:53 pm

Thank you for your insight. I am considering an array with 8 x Intel X-25m 160GB SSD. Yes I am running 10Gb nics. The volume is only 1.1 TB. I know after 1 year of usage I have 94% of spare area available so I don't think I'll wear out these SSD's under current usage model anything soon. What has peaked my interest the most is this paragraph

If a volume dies, then the HA volume on that Starwind node will die too, but be ok on the other node. Starwind itself will stay running (unless Windows/Starwind are running off same volume!). You will have to recreate volume from scratch, format it, shut your working HA partner down gracefully, shut down all your iSCSI clients, bring back working partner, delete the target (but not the img), copy the img across to the other server, and recreate the HA target in Starwind, choosing not to sync.

Why would I have to take the running iSCSI target offline, delete the target, copy the IMG file, and recreate the target without a sync? Is StarWind not smart enough to recreate the missing IMG file? Is this true with v5.6?

Fri May 06, 2011 9:15 pm

1) You should use Write-Back Cache of the largest possible size (80% of free physical RAM) when running HA configuration.

2) Striping is definitely OK for HA. Just don't stripe more then 2 disks as it's too risky: double fault (disk failure on one node and say CPU failure on the other) could render whole storage cluster useless... Upcoming versions of StarWind will have 3 HA nodes and in this case you could put into RAID0 many more then just two disks.

3) SSDs have smarter recovery then spindles so you can stripe even more then just two right now.

Aitor_Ibarra · Mon May 09, 2011 2:15 pm

Hi nbarsotti,

I didn't realise you were talking about SSDs. X25-M has been pretty good in my experience and if you are keeping loads of spare area - either by just not having that many writes, or by over provisioning, then you should be fine. Of course an SSD can fail for reasons other than write endurance, but I'd expect reliability as a whole to be much better than HDD. So, RAID0 isn't such a risk, especially with Starwind HA. I would suggest you look at LSI RAID controllers, as they have some features that may help. #1 is the abililty to start migrating data off an SSD in a RAID0 to another drive if it starts having SMART errors. This is almost as good as having a hotspare on a non-R0 volume. #2 is the FastPath feature (optional extra) which improves performance with SSDs (I'm not sure what the secret sauce is, but it does very well). And #3 is the CacheCade feature. This may or may not interest you - it allows you to use up to 512GB of RAID 0 SSD as a read cache for a larger HDD based volume. Useful for when you run out of RAM for Starwind cache.

Volume death: If a RAID 0 volume goes, then so does the .img that Starwind stores on it. This means that fast sync will be impossible, and you will have to do a full sync, but even this requires there to be an img at the other end. So you have to copy the img across after recreating the volume, or choose full sync when you create the volume. Either way, the full .img is going to have to be transferred.

Anton: cpu failure?! That's got to be rare, even with my luck that's never happened to me!

Tue May 10, 2011 3:03 pm

OK, "CPU failure" should be actually read as "CPU fan failure"

P.S. It's around 50 degrees in our server room now... Industrial air condition is not enough. And it's only mid-May.

Aitor_Ibarra wrote:Hi nbarsotti,

I didn't realise you were talking about SSDs. X25-M has been pretty good in my experience and if you are keeping loads of spare area - either by just not having that many writes, or by over provisioning, then you should be fine. Of course an SSD can fail for reasons other than write endurance, but I'd expect reliability as a whole to be much better than HDD. So, RAID0 isn't such a risk, especially with Starwind HA. I would suggest you look at LSI RAID controllers, as they have some features that may help. #1 is the abililty to start migrating data off an SSD in a RAID0 to another drive if it starts having SMART errors. This is almost as good as having a hotspare on a non-R0 volume. #2 is the FastPath feature (optional extra) which improves performance with SSDs (I'm not sure what the secret sauce is, but it does very well). And #3 is the CacheCade feature. This may or may not interest you - it allows you to use up to 512GB of RAID 0 SSD as a read cache for a larger HDD based volume. Useful for when you run out of RAM for Starwind cache.

Volume death: If a RAID 0 volume goes, then so does the .img that Starwind stores on it. This means that fast sync will be impossible, and you will have to do a full sync, but even this requires there to be an img at the other end. So you have to copy the img across after recreating the volume, or choose full sync when you create the volume. Either way, the full .img is going to have to be transferred.

Anton: cpu failure?! That's got to be rare, even with my luck that's never happened to me!

nbarsotti · Tue May 10, 2011 3:16 pm

Thank you for all the input. I might build 4 strips of 2 rather than 1 strip of 8 based on your recommendation.

Tue May 10, 2011 3:36 pm

Make sense. If you'd manage to run performance tests and share results with us your feedback would be appreciated greatly!

nbarsotti wrote:Thank you for all the input. I might build 4 strips of 2 rather than 1 strip of 8 based on your recommendation.