StarWind iSCSI SAN V5.5 Build 20100831

Public beta (bugs, reports, suggestions, features and requests)

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

User avatar
Bohdan (staff)
Staff
Posts: 435
Joined: Wed May 23, 2007 12:58 pm

Thu Sep 02, 2010 10:33 am

Dear Beta-Team members,

StarWind V5.5 build 20100831 beta is available for download from the following link:

http://www.starwindsoftware.com/beta/St ... 100831.exe

and the license you can download from this link:

http://www.starwindsoftware.com/beta/bu ... taTeam.swk

New features and improvements:

High Availability: Added heartbeat feature for synchronization channel state tracking.
High Availability: CHAP authentication support added.
GUI: Management Console updates.

Known issues:

High Availability: Errors with migration of ESX virtual machine started after StarWind HA node reboot.
IBVolume: Errors with snapshots for GPT disks.

We are very interested in your input on HA device test results.
Please do not hesitate to contact us should you have any questions.
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Tue Sep 07, 2010 10:52 am

I'm just starting to test this build, after having very good results with 20100821, which I tested for over five days, continously writing random data and verifying it, and at the same time rebooting each starwind node once per hour.

EDITED

The new heartbeat feature, I thought, was to help prevent split brain scenario when the sync network goes down? It's not in the help file yet, but in any case it doesn't work for me. CORRECTION: DOESN'T ALWAYS WORK!


It's been a long time since I tested this scenario, which in the real world should be rare, but is possible.

What I did:

1) Set up new HA target using write-through cache, heartbeat and clients on one network, sync on another
2) Run continous data writing test (Bart's Stuff Test v5.1.4) from initiator. This writes a randomly generated stream of data and then verifies it.
3) disconnect sync network (this simulates failure of network between starwind nodes - e.g. switch, nic, cable failure)

Expected behaviour
- The target would not go down - the starwind instance defined as primary would continue serving the target
- The starwind instance defined as secondary would stop serving its target
- On restoring the sync network, the scondary node would re-sync with the primary

Actual behaviour:
- The target went down
- Bart's Stuff Test reported an error because it couldn't see the drive any more
- Starwind says both nodes out of sync, have to force a full sync

CORRECTION: it worked when I disabled/disabled the sync NIC in windows. WHen it didn't work, I disconnected the NIC from the hyper-V switch (equivalent to pulling wire out) - I run Starwind in Hyper-v VMs. I will need to do more testing...

Am I using the heartbeat in the wrong way? Do I have to configure MPIO on initiatior to be active/passive rather than active/active? Is cacheing incompatible with heartbeat?

thanks,

Aitor
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Tue Sep 07, 2010 12:28 pm

To add to above... there don't seem to be any problems if I use failover only (active/passive) MPIO policy on the initiator. I'm fine with this, but you probably want a warning somewhere as Round Robin (active/active) is the default on the MS initiator.

Another thing for the UI - it would be nice if the heartbeat interface was shown for both Current Server and Partner Server in Device Properties - currently only Partner Server is shown.
User avatar
Bohdan (staff)
Staff
Posts: 435
Joined: Wed May 23, 2007 12:58 pm

Tue Sep 07, 2010 1:01 pm

Thank you Aitor for your help!!!
Yes, the purpose of heartbeat is to provide additional communication channel between HA nodes and keep one HA node serving clients and prevent the serving on the other one in case of data sync channel failure.

>>Am I using the heartbeat in the wrong way? Do I have to configure MPIO on initiatior to be active/passive rather than active/active? Is cacheing incompatible with heartbeat?
No, you are using it in the right way. It must work correctly for both cases. It must work with cacheing enabled.

Could you please zip and send us StarWind logs from both HA nodes?
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Tue Sep 07, 2010 1:28 pm

just sent them!
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Sep 07, 2010 7:41 pm

OK, checking the stuff... We'll be back soon. Thank you!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Steve
Posts: 2
Joined: Sun Sep 12, 2010 7:08 pm

Sun Sep 12, 2010 7:18 pm

I am getting very slow performance on the resync. Only about 15% of the bandwidth is being used. The sync connection is dedicated 1GB crossover cable. I have enabled Jumbo frames but that does not seem to help.
User avatar
Bohdan (staff)
Staff
Posts: 435
Joined: Wed May 23, 2007 12:58 pm

Mon Sep 13, 2010 8:16 am

Hi Steve ,
What are the NTTTCP or iPerf test results for the sync channel?
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Mon Sep 13, 2010 4:36 pm

Today I've done a little performance testing... on the latest beta StarWind_5.5_beta_20100821. Please note these were simple, quick tests and I didn't run the tests repeatedly and take averages etc. Very quick and dirty!

I've tested an HA target with WT cache and WB cache, and had some interesting results. I've also mucked about with the MPIO settings on the client. My network is assymetrical, in that although the client and both starwind boxes have 10GbE, one of the paths has to go switch to switch, and this is via 1GbE.

The Starwind hosts are hyper-v virtual machines, running on seperate physical hosts.
The client is Windows 2008 R2.

The test iscsi HA target was on 7.2k rpm SATA drives, RAID 1, on Areca 1680ix-24 RAID controllers with 4GB write cache. Again, there is some assymetry here, as one node has this to itself, whereas the other node is sharing its Areca with a production Starwind 4.2 instance. So one server benefits a lot from the write cache on the Arecas, the other less so. For this reason I've pretty much ignored the write speed. Previous testing has shown that writes tend to be slower when using round robin MPIO, presumably because there was more disk thrashing.

All tests were done to fit into Starwind's cache (256MB). They were done with ATTO, overlapped i/o with a queue depth of 10.

1) I set the cache to WT. This means that writes are not cached, i.e. a write operation will not complete until it's on the disk. Recently read or written data will be in the cache, which can speed up reads.
MPIO policy: failover only (faster route active): up to 316MB/sec reads.
MPIO policy: round robin (both routes active): up to 130MB/sec reads.
MPIO policy: failover only (slower route active): up to 100MB/sec reads.

2) I set the cache to WB. This caches writes too.
MPIO policy: failover only (faster route active): up to 550MB/sec reads.
MPIO policy: round robin (both routes active): up to 205MB/sec reads.
MPIO policy: failover only (slower route active): up to 100MB/sec reads.

I'm not sure why WB cache speeds up reads more than WT. I personally wouldn't use WB cache in production because of the extra risk (even though I have a UPS!).

The drop in speed in round-robin MPIO is perfectly understandable to me, as even with cacheing, there would have been much more disk thrashing as writing to both starwind nodes simultaneously effectively makes the i/o non-sequential. This is compounded by the assymetry of my network.
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Mon Sep 13, 2010 4:49 pm

Aside to above: until today I was doing a reliability test. 20100831 passed with flying colours.

For 6 days and 1 hour I've been testing using Bart's Stuff Test. Target was HA, WT cache, auto-resync. The test writes and verifies randomly generated data. Over the test period, 5.4TB was written and verified. Each Starwind node was set to reboot itself once per hour, with 30 mins between each node, so in the test period there were 145 reboots per node, therefore 290 resyncs. No problems whatsoever.

If you are running Windows 2008 R2 on either the host or the client, and you get a BSOD, look in the windows logs - you will find that the windows iSCSI initiator was the clulprit. There is a hotfix from Microsoft which cured this for me. It is documented as being aimed at issues with booting from iSCSI, but whatever it does, it has fixed my BSOD issues. See http://support.microsoft.com/kb/979711/en-gb. Starwind HA uses the windows iSCSI initiator for sync; so as far as I'm concerned, this was a Microsoft bug rather than a Starwind one.

cheers,

Aitor
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sat Sep 18, 2010 5:17 pm

Oh, thank you very much for keeping us updated! BTW, as looks like you're a big fan of VM-ed StarWind :) In such a case I think we can leak Linux-based Hyper-V VSA to you. It's a little bit raw for a real public beta but we're really interested in your personal feedback :) Let me know if you're interested. Thanks!
Aitor_Ibarra wrote:Aside to above: until today I was doing a reliability test. 20100831 passed with flying colours.

For 6 days and 1 hour I've been testing using Bart's Stuff Test. Target was HA, WT cache, auto-resync. The test writes and verifies randomly generated data. Over the test period, 5.4TB was written and verified. Each Starwind node was set to reboot itself once per hour, with 30 mins between each node, so in the test period there were 145 reboots per node, therefore 290 resyncs. No problems whatsoever.

If you are running Windows 2008 R2 on either the host or the client, and you get a BSOD, look in the windows logs - you will find that the windows iSCSI initiator was the clulprit. There is a hotfix from Microsoft which cured this for me. It is documented as being aimed at issues with booting from iSCSI, but whatever it does, it has fixed my BSOD issues. See http://support.microsoft.com/kb/979711/en-gb. Starwind HA uses the windows iSCSI initiator for sync; so as far as I'm concerned, this was a Microsoft bug rather than a Starwind one.

cheers,

Aitor
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Wed Nov 03, 2010 4:25 pm

When one of my servers came back from repair (finally, after about 2 months!) a 2TB target that I had needed a full sync, although normally a fast sync is OK. I think I've seen elsewhere on the board that if there's an outage for a long time and/or a lot of data changes on the surviving node, fast sync is blocked and you have to do a full sync instead.

Can we have it doucmented what the thresholds for this are?

E.g.

if you are down for more than x days, you have to do a full sync
or
if y% of the data changes on the working target, you have to do a full sync

It's quite important to know, especially as during a full sync the target is unavailable.

Thanks,

Aitor

ps any news on when there might be a new build?
User avatar
Alex (staff)
Staff
Posts: 177
Joined: Sat Jun 26, 2004 8:49 am

Wed Nov 03, 2010 5:24 pm

Aitor,
General rule that allows fast synchronization is that source node for synchornization must have 'synchronized' state.
Also when the fast sync log grows up to the certain size it is meaningless to execute fast synchronization.
If the fast sync log data size have achieved this size, full synchronization is executed too.
If synchronization process failed then only full synchronization can be executed next.

For 5.4 version fast sync is inaccessible if HA device has WB cache enabled.
For 5.5 version - if the node with WB cache has been correctly shutdown, fast synchronization option remains available.
Best regards,
Alexey.
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Wed Nov 03, 2010 6:49 pm

Hi Alex,

This was with V5.5 Build 20100831. The target was WT, not WB. And the node I wanted to sync from was in synchronised state. There would have been about 200GB difference between the nodes, as about 200GB was re-written every day; the target was being used for backups if several VMs.

You say:
If the fast sync log data size have achieved this size, full synchronization is executed too.
The size is exactly what I want to know! So that if I ever get a period of extended downtime, I have an idea of when a fast sync is not going to be possible, as full syncs would need to be scheduled to a less disruptive time.

Thanks,

Aitor
User avatar
Alex (staff)
Staff
Posts: 177
Joined: Sat Jun 26, 2004 8:49 am

Mon Nov 08, 2010 6:03 pm

Fast sync log size is about 0,018 % of the size of device. Divide this size by 12 to get the number of write operations that can be done before sync mode will be changed to full sync.
Best regards,
Alexey.
Post Reply