High Availability Configuration?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Locked
Guest

Mon Jan 10, 2005 11:36 pm

Greetings,

We are looking for options to convert several (8+) servers from direct attached storage to running off a full fledged IP SAN. Implementation details aside, as we will most likely be using hardware HBAs to support booting from the SAN, we are very concerned about the stability and availability of the iSCSI target.

Do you have any implementation details or ideas for how StarWind could be run "Highly Available"? Basically, we just want to make sure that if we move 8 servers on to an IP SAN, we don't have all 8 crash if there are any hardware / software problems with the target. Is there a way to have it run active/passive or active/active with failover?

Thanks!
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Jan 11, 2005 8:51 pm

You never know before you'll try. I mean StarWind is proven as stable in current build (at least now deadly known issues), however telling you we're 100% sure and had tested every possible configuration would be a lie. Just start with evaluation on one test machine and convert servers from NAS -> SAN one-by-one. You do not even need to pay before you'll be absolutely sure the software would be adopted!

Back to failover -- we do not provide this stuff in our software. I mean if one machine died there's no way for another copy of StarWind on the second machine to pick up broken mirror volumes. You can create a RAID over multiple machines and this should work with some minor issues.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
stever@bitshop.com
Posts: 3
Joined: Sun Aug 07, 2005 6:46 am
Location: Ashburn, VA

Sun Aug 07, 2005 6:49 am

> You can create a RAID over multiple machines and this should work
> with some minor issues.

Can you give more detail of the "minor issues" ?

I tried this, it seems like if I reset one of the starwind servers the drive goes to failed state, windows doesn't try to rebiuld it when it comes back (?) - Then when I told it to regenerate I had an inaccessible volume.

NOTE: I did the fail / reconnect DURING the initial format. It could be a windows problem, but I figured I'd read through the forums instead of spend mroe hours testing - Hopefully you have some info about what the minor issues are which covers this.

Thank you,
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun Aug 07, 2005 8:29 pm

Minor issues - MS has problems with supporting dynamic disks over iSCSI (and all RAIDs are dynamic disks in 2000+). If you use MS initiator with dynamic disk please search the forums (MS ones) about how to enable dynamic disk + iSCSI support. We would be:

1) adding dynamic disk suppor to upcoming StarPort (for now software RAIDs come AFTER DMIO Veritas driver licensed by MS so iSCSI RAIDS need to be rebuilt - VERY long and painful process for 100GB+ volume). We would eliminate this.

2) add software RAID (start with "mirror" aka RAID-1) in upcoming StarWind. So for Windows whole RAID would look like single SCSI LUN, not as two SCSI LUNs + software stack over them.

Releases are scheduled for 2-4 weeks from today.

Hope this helped. Thanks!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
stever@bitshop.com
Posts: 3
Joined: Sun Aug 07, 2005 6:46 am
Location: Ashburn, VA

Mon Aug 08, 2005 12:43 am

I'm looking for two basic functions. I think you are answering with this in mind, but let's just be sure:

1) If the target fails (not the initiator / client) then I want the initiator to keep going as if nothing happened. When the target comes back up it rebuilds somehow. It *SHOULD* be able to assume the data didn't change since it went down (i.e. write a list of changes somewhere).

2) I ONLY care about uptime / decreasing the possibility (eliminating really) of data loss. RAID-1 is simplest, this is quite satisfactory instead of trying to do a raid 5 to multiple targets.

3) Something like multipath should be able to say request a read in round robin / other method, but send writes to ALL servers. This is handled by various block level file systems in Linux, but on Windows this is relatively pricy licenses per server to have this kind of block level technology.

4) If something happens to go wrong I want the initiator to abort both connections instead of risk getting the raid out of sync. I realize this can happen either due to gross incompetance or drastic errors (i.e. reset one target, reset other while rebuilding, etc.). However these kind of things should be thought out as much as possible.

5) I want to be able to snapshot the targets. I see this mentioned in various places as a capability of Starwind 2.4+ but don't see anywhere how to do it (although I haven't searched a lot for that yet).

As to size of volumes, at this point we're sticking to relatively small (< 30 gb) volumes for our tests. A complete raid rebuild is satisfactory if a target drops out - We shouldn't be dropping targets often on a LAN - But security fixes, upgrades, etc. have to happen sometime - And you can't shut down every client in order to upgrade the target.

Thanks,
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Aug 08, 2005 7:19 pm

1) Neither MS nor RDS iSCSI initiator code has such a way to replicate data. What you need is really RAID over iSCSI volumes in this particular case. OK, we'll add dynamic disk reactivation code to StarPort (if you're going to use MS iSCSI inititator you'd better bug MS support how to do it - maybe you can solve your problem with their initiator code right now).

2) Exactly!

3) We're busy with adding complete replication to StarWind. So single data stream would go from StarPort (or any other initiator) to iSCSI target (StarWind of course) and then it would be splitted between different datacentre nodes. So if one storage node would go down - rest of the guys would still have their records updated.

4) I don't understand much of this however we'll add some sort of the code to provide "abortive" instead of "graceful" connect.

5) Valery would provide you with a special version for tests. Please drop a message to support@starwind.com and you'll get download details.

If complete RAID rebuild works fine for you I don't see any problems at all!

Thanks!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Locked