DavidMcKnight ,
Are both paths between your clients and your starwind boxes 10Gbit end-to-end, and are you using a seperate 10Gbit connection for sync?
I'm doing some heavy failover testing at the moment, and I see three difference performance levels when my "A" node is running by itself, my "B" node by itself, and when both are running normally - and this is totally explainable by looking at my network:
client box: MS iSCSI initiator, single Intel 10Gbit nic (actually lom!)
Switch A: 10Gbit / 1Gbit - 10Gbit to client box, 10Gbit to Starwind node A, 1Gbit to Switch B
Switch B: 10Gbit to Starwind node B, 1Gbit to Switch A
Starwind Node A: 10Gbit to Switch A, another 10Gbit to Starwind Node B
Starwind Node B: 10Gbit to Switch B, another 10Gbit to Starwind Node A
Both Starwind boxes: Starwind running inside Hyper-V VM (Win2k8 R2), Intel NICs, Areca 1680ix-24 RAID, and for this test, each has 2x 7200rpm SATA drives (2.5") in RAID 1. These are *not* great performers. The HA target is using Write Through caching, and with my test, most of the reads will fit in the cache, so the drives are getting mostly writes. I've written about 3TB of randomly generated data to my test target over five days with no issues, using random and sequential access patterns.
When both nodes operating, client box peaks at about 15% utilisation.
When node A operating by itself, client box peaks at about 30% utilisation.
When node B operating by itself, client box peaks at about 8% utilisation.
Why the weird scores? MS MPIO is attempting to balance the i/o across the two paths, but they are not equal, as to reach node B, it has to get through a 1Gbit connection. So it scores best when 100% of the i/o goes to the one server that it can reach on a pure 10Gbit network.
This doesn't explain the behaviour you are seeing, but I would look at the following:
1) Your network topology. Do any clients have to go through a 1Gbit connection to get to your Starwind boxes?
2) Your MPIO policies - how are these distributing i/o between your HA targets?
3) Is HA sync (between Starwind boxes) going over a 10Gbit path? Is this a DIFFERENT path to the one used to talk to clients?
4) Starwind HA uses the Windows iSCSI initiator for sync. There is a on-request hotfix for this which has cured some BSOD issues I was having - you may want to try it:
http://support.microsoft.com/kb/979711/en-gb
5) Check your intel NIC drivers... not fun, although I've found them orders of magnitude better than their RAID drivers...
cheers,
Aitor