Random suggestions and feedback for 5.7

Public beta (bugs, reports, suggestions, features and requests)

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

gstephenson
Posts: 31
Joined: Tue Feb 22, 2011 10:09 pm

Wed Jun 08, 2011 4:43 pm

hixont wrote:In my scenario the NIC with the HB and Sync channels went offline, but both of my servers were still live and "thinking" they were now the only survivor of a failed pair.
Yes, so in this scenario, does Starwind automatically try another NIC to find a heartbeat? (I have a second NIC for client connections.) This is what I understand Anton to be saying above.
deiruch
Posts: 35
Joined: Wed May 25, 2011 12:16 pm

Wed Jun 08, 2011 7:45 pm

It appears that it only uses the interfaces for sync and heartbeat (which is the right thing to do IMHO).
gstephenson
Posts: 31
Joined: Tue Feb 22, 2011 10:09 pm

Wed Jun 08, 2011 8:19 pm

gstephenson wrote:
hixont wrote:In my scenario the NIC with the HB and Sync channels went offline, but both of my servers were still live and "thinking" they were now the only survivor of a failed pair.
Yes, so in this scenario, does Starwind automatically try another NIC to find a heartbeat? (I have a second NIC for client connections.) This is what I understand Anton to be saying above.
Anton: Can you please clarify!
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 10, 2011 7:23 pm

For charts and logging we'll accept your proposals. Thanks you for your feedback!

For TCP settings we'd like to avoid any kind of registry tricks b/c of the multiple reasons: they change frequently, some of them require reboot, some are obsolete etc.

Delayes start is required as we need to be Windows Server 2003 compatible and we rely on some of the network stack components to start before we can go.
DavidMcKnight wrote:Here's a couple comments for Starwind's GUI.

I've been having a real problem with the performance of my datastores. So I hoped I could use the new performance data tools in Starwind to help. First, and I think it's already been mentioned, the "Total Bandwidth" chart needs better labels and I don't understand the numbers "Latest 0,01" what is a "0,01" value? But I what I really need is real time values. When I'm troubleshooting/testing I need to know what is going on right now. For numbers that have a "Per Second" value how about an option for the chart to update once a second. Also what's missing is the data that helps me figure out where the bottle neck is. I treat my datastores like race cars. I try to squeeze everything I can out of them. So I'm constantly tweaking them. But, from inside of Starwind, I can't tell what component (network, hard drives, CPU, etc.) needs tweaking the most.

Also when I click on a target and then my iSCSI Sessions. As I understand it, and iSCSI session is negotiated. So if I click on a particular session I could get to details for that session (Initial R2T, Firsr Burst Length, Max Burst Length, etc.). So I could make sure I have the clients and servers are configured correctly. I can never have too much information. I'd like to see all the session settings the client (VMware Host) wanted to have, all the settings Starwind wanted to use, and what settings were agreed upon.

There are several threads in the forums describing how to get a little bit more from Starwind (GlobalMaxTcpWindowSize, TcpWindowSize, Tcp1323Opts, SackOpts, etc.). Most involve digging around in the registry . There really should be a way to apply those setting from inside of Starwind's GUI.

Lastly, I'm curious, the Starwind Service it's self. I see it's set to Delayed Start. Why is that? I mean, when I start up my iSCSI Server, I want it up and running right away. I find myself rebooting the server, for whatever reason, then once I can logon, I got to services and start the service myself, not wanting to wait 2 or 3 minutes for the service to start as it is currently configured. So why is it set to delayed, what exactly is it waiting on?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 10, 2011 7:24 pm

Pretty close to thruth...
Aitor_Ibarra wrote:Just responding to David about the delayed start thing...

Firstly, what I'm saying here is opinion, not fact - I could be very wrong!

Starwind runs on top of Windows and thus has dependencies on other Windows services. These in turn have dependencies on other services. And Windows is a moving target (e.g. Windows Update). It's probably not realistic for Starwind to constantly stay on top of changes in Windows and alter the service startup dependencies. So, it's much safer to set to Delayed Start. This doesn't mean "wait ten minutes after the system has booted" - Delayed Start services should be started as soon as all the Automatic services have started. This means that everything that Starwind needs will definitely be running when it starts, moreover, as services start pretty quickly, it's unlikely that you'd see much improvement if the Starwind service was Automatic with dependencies set. Obviously, this is one of the reasons why it's a good idea to run Starwind on a dedicated windows installation, so that you aren't competing with other services starting up at boot (e.g. SQL, Exchange, Active Directory).
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 10, 2011 7:33 pm

Wrong... You need to configure H/B thing using dedicated hardware (NIC, wiring and optional switch). Ideal setup is a pair of low-performing NICs (say 1 GbE as bandwidth is not an issue and ping time is 1 second or so) teamed and using cross-over cabling (avoid switch if possible). Or (even better) use uplinks for iSCSI traffic as a H/B network. In such a case no ping should effectively mean no route to initiator so no split brain could occure.

In upcoming versions we'd represent multiple H/B and sync channels, restore points (for situation if everything would go deadly wrong - very very very low chances), automatic verification for proper configuration (you could still use VLANs to goof StarWind checker however), custom MPIO to avoid brain split completely and multiple HA nodes doing majority voting.

Right now you should have double fault (no redundant sync channel and no working heartbeat channel) to breat the whole thing up. In upcoming versions StarWind should survive after triple or quadruple faults with recovery point as your "last chance" disaster recovery. Solid as a rock!

Thank you for your feedback as it helps to make our software better!
gstephenson wrote:
anton (staff) wrote:You absolutely need to have heartbeat and sync channels utilizing different hardware. H/B is not resource critical so even dedicated 1 GbE link should be fine.

P.S. We actually use ALL existing links for H/B so at least one of them should be alive.
hixont wrote:The only time I have a problem with the split brain issue with a two node 5.6 HA configuration was because of misconfigured links. I had my sync channel on the same physical NIC as the hearbeat channel. This was an oversight when I moved the sync channel from a 1GB NIC to the 10GB NIC. I forgot to move the heartbeat from 10 GB NIC onto a different NIC. So when the 10 GB NIC failed it took the sync and heartbeat channels offline at the same time.

Once I reconfigured the channels I can take the either physical NIC offline (on purpose or accidently) and the SAN keeps working without data loss.
Ok, now I'm worried! I am running v5.6 on two servers, and have 2 NICs in each server, and 3 HA targets hosting Hyper-v VMs in a cluster between the two servers.

First NIC has two 10Gb ports and two 1Gb ports. Three of the ports have crossover cables between my two servers.
One 10Gb port is the sync, and one 1Gb port is heartbeat. Second 10Gb port also has a crossover.

Second NIC has two ports, each connected to 100Mb/s LAN for client connections.

Since heartbeat and sync are on the same NIC, will I get the brain split issue or will I be saved by Starwind using all networks for heartbeat? I'm wondering about the scenario where I need to reboot one server to apply OS patches? ...But even if H/B was on a different NIC than sync, rebooting the server still takes both of them down, right? So then what?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 10, 2011 7:36 pm

With H/B properly configured (using initiator path as a H/B channel) it would not happen. No H/B = no initiator connect = no writes = no brain split.

But we cannot force proper H/B configuration. And you're right we should survive after triple faults at least and provide fast wake up & restore in any case.
deiruch wrote:
gstephenson wrote:Since heartbeat and sync are on the same NIC, will I get the brain split issue or will I be saved by Starwind using all networks for heartbeat? I'm wondering about the scenario where I need to reboot one server to apply OS patches? ...But even if H/B was on a different NIC than sync, rebooting the server still takes both of them down, right? So then what?
Don't be worried. A split brain scenario requires two working servers. It's only a problem if you have two running servers, initiators can connect to these servers and the sync & heartbeat channels are dead - all at the same time.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 10, 2011 7:37 pm

If you'd configure H/B to use initiator path you'd never have anything like that.
hixont wrote:In my scenario the NIC with the HB and Sync channels went offline, but both of my servers were still live and "thinking" they were now the only survivor of a failed pair.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 10, 2011 7:38 pm

I did! Please read the whole thread from the very beginning :)

P.S. It's all about clusters. StarWind itself is only a synonym of a "cluster" here.
gstephenson wrote:
gstephenson wrote:
hixont wrote:In my scenario the NIC with the HB and Sync channels went offline, but both of my servers were still live and "thinking" they were now the only survivor of a failed pair.
Yes, so in this scenario, does Starwind automatically try another NIC to find a heartbeat? (I have a second NIC for client connections.) This is what I understand Anton to be saying above.
Anton: Can you please clarify!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
hixont
Posts: 25
Joined: Fri Jun 25, 2010 9:12 pm

Fri Jun 10, 2011 8:31 pm

anton (staff) wrote:If you'd configure H/B to use initiator path you'd never have anything like that.
hixont wrote:In my scenario the NIC with the HB and Sync channels went offline, but both of my servers were still live and "thinking" they were now the only survivor of a failed pair.
I did, but they were on the same physical NIC. Like I said an oversight on my part when I made a network upgrade to support a 10GB synch link between the two servers. The fault was mine not the software.
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sat Jun 11, 2011 11:41 am

Unless you've used VLANs so we could not distinguish NICs from each other I do consider this as being our issue.
hixont wrote:
anton (staff) wrote:If you'd configure H/B to use initiator path you'd never have anything like that.
hixont wrote:In my scenario the NIC with the HB and Sync channels went offline, but both of my servers were still live and "thinking" they were now the only survivor of a failed pair.
I did, but they were on the same physical NIC. Like I said an oversight on my part when I made a network upgrade to support a 10GB synch link between the two servers. The fault was mine not the software.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
hixont
Posts: 25
Joined: Fri Jun 25, 2010 9:12 pm

Mon Jun 13, 2011 4:08 pm

anton (staff) wrote:Unless you've used VLANs so we could not distinguish NICs from each other I do consider this as being our issue.
I'm using VLANS and mixture of private and public IP addressing to segment my data, so I do not believe this to be a Starwind issue.
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Jun 14, 2011 9:15 am

You're 100% correct! Using the same underlying hardware and creating VLANs is perfect way to goof StarWind configuration checker :))
hixont wrote:
anton (staff) wrote:Unless you've used VLANs so we could not distinguish NICs from each other I do consider this as being our issue.
I'm using VLANS and mixture of private and public IP addressing to segment my data, so I do not believe this to be a Starwind issue.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
kmax
Posts: 47
Joined: Thu Nov 04, 2010 3:37 pm

Thu Jun 16, 2011 11:27 pm

So where we at with the official release of 5.7?
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 17, 2011 6:18 am

Nearly done. Still need a couple of extra days to fill with the final checks.
kmax wrote:So where we at with the official release of 5.7?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply