Random suggestions and feedback for 5.7

Public beta (bugs, reports, suggestions, features and requests)

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

deiruch
Posts: 35
Joined: Wed May 25, 2011 12:16 pm

Fri May 27, 2011 8:33 am

Hello everyone

First time beta tester here. Here are some suggestions/feedback in no particular order:
  • On all wizard pages with option buttons: Cursor keys should select/change the options. For example when creating a new target I should be able to create it just by pressing Up/Down and Enter. All regular Windows wizards work like this (for example IE's first start wizard)
  • Get rid of the skins. Makes your product look like a toy.
  • Update the help page of "Creating new high availability device/Step 6". Sync and heartbeat-interface-Explanation is missing
  • "Add target wizard/Data synchronization channel parameters": When seleting "(none)" the partner interface should also switch to "(none)". Also I don't quite understand what the refresh buttons are there for.
  • Help should explain cache modes (write through vs. write back)
  • It would be nice for testers/first time users if "starwind" was saved already in the management UI. If you don't want to do it, at least describe it right there (or at least in the help). It took me some minutes to find out what the password is. It would be better to configure anonymous access by default (and a reminder to change the password) - because keeping the default account/passwort doesn't improve security over using no credentials at all.
  • How can I change caching settings of a target after its creation?
  • When scrolling logs with the scroll wheel the cursor should scroll with the text (not move relative)
  • The first initial sync of the first target I created failed (Logs in
    initial-ha-sync-failed.zip
    (19.36 KiB) Downloaded 611 times
    ).
  • Rename the label "Cache size in mb" to "Maximum cache size (mb)". This change would remove the need of the descriptive label below it.
  • The distinction between "mirror (raid-1) device" and "high availability device" is a bit unclear. Neither the manual nor the UI provide any guidance as to when one should choose one over the other.
  • The "About high availability devices" topic is empty in help
  • Can I resize (HA) targets after creation?
  • The pdf's on the website should not be secured. I wanted to copy things like the MPIO hardware id's but found myself unable to do so.
  • Why do your write all 0 to the image file when initializing an image file? NTFS guarantees that a newly created file is already zeroed. Maybe the wizard is just unclear and I selected the wrong options. This problem occurs with the default options.
  • The service startup should not be set as "Automatic (delayed)" but instead use the service notification functionality in Vista/7/2008 so the service is started as soon as the network stack is ready. See http://msdn.microsoft.com/en-us/library/dd405512.aspx. If you don't want to do this, just start the service and wait then for the availability of the network stack.
  • The management console should save the layout/splitter positions.
  • The eventlog should be sorted by time, newest first. The windows eventlog is sorted the same way.
  • Usability issue: It is non-obvious that syncing HA nodes works with the normal iSCSI connections. I disconnected the target and re-added it (to add it as a favourite target). Disconnecting the target broke the sync. It would be helpful to
    a) Tell the user that iSCSI connections via initiater is used by StarWind itself
    b) Add targets as favorites by default
  • All the PDFs explaining Microsoft Failover Cluster things on the website: Bringing the disk online is not necessary. It is enough to initialize and format the disk on one node. Bringing the disk online on other nodes will corrupt NTFS and means that you have to reformat the disk again after bringing it online on all nodes.
  • The selected performance counter should stay selected when switching between servers. I'd like to compare the performance of two servers - this is cumbersome to do with the current UI.
  • An overview page of all important performance counters combined on a single screen (not necessarily in a single graph) would be nice. Some sort of a performance dashboard. Something like the first page of the "Resource Monitor" in Windows.
  • In stats: "mbps"? Is that mb/s or megabits/s? On the disks? Over the network? Is sync traffic included?
  • In stats: The most recent data point is always 0. Hide it.
  • On HA: Your current model of HA is broken. There's *NO* reliable way for HA failover with a two-node cluster. Adding additional, redundant links just makes the problem less obvious. You need a third witness node, a quorum or only do failover on shutdowns. Automatic failover otherwise *WILL* produce a split brain scenario under some circumstances. Is there a way to configure StarWind to only failover when the primary node is shut down in a controlled way? And is there way to do manual failover?
All those things aside, StarWind is a great product! Can't wait for the release with more than two HA nodes.

Cheers,
Simon
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun May 29, 2011 1:53 pm

OK, my comments are embedded into original post with ANTON tags. Everybody's opinion is welcomed!
deiruch wrote:Hello everyone

First time beta tester here. Here are some suggestions/feedback in no particular order:
  • On all wizard pages with option buttons: Cursor keys should select/change the options. For example when creating a new target I should be able to create it just by pressing Up/Down and Enter. All regular Windows wizards work like this (for example IE's first start wizard)

    Anton: OK, makes sense!
  • Get rid of the skins. Makes your product look like a toy.

    Anton: Question of taste entirely... I love skins (and so do many people). And theme / skinning engine helps us to provide custom version for a lot of companies. We would consider DISABLING them "by default" however.
  • Update the help page of "Creating new high availability device/Step 6". Sync and heartbeat-interface-Explanation is missing

    Anton: Accepted!
  • "Add target wizard/Data synchronization channel parameters": When seleting "(none)" the partner interface should also switch to "(none)". Also I don't quite understand what the refresh buttons are there for.

    Anton: Makes sense!
  • Help should explain cache modes (write through vs. write back)

    Anton: Assumed people know but there should be some StarWind specific stuff inside (no WB cache with single node and so on). So we'd update this for sure!
  • It would be nice for testers/first time users if "starwind" was saved already in the management UI. If you don't want to do it, at least describe it right there (or at least in the help). It took me some minutes to find out what the password is. It would be better to configure anonymous access by default (and a reminder to change the password) - because keeping the default account/passwort doesn't improve security over using no credentials at all.

    Anton: I think we'll go blank/blank as login/password (and keep whole default ones from the old versions not to confuse old users). And would ask for applying changes if left blank each time.
  • How can I change caching settings of a target after its creation?

    Anton: You cannot do it with V5.7 but with V5.8 we're going to represent Advanced Cache Manager allowing users togging properties "on-the-fly". So valid point thank you ::)
  • When scrolling logs with the scroll wheel the cursor should scroll with the text (not move relative)

    Anton: OK!
  • The first initial sync of the first target I created failed (Logs in
    initial-ha-sync-failed.zip
    ).

    Anton: OK, we'll check it out...
  • Rename the label "Cache size in mb" to "Maximum cache size (mb)". This change would remove the need of the descriptive label below it.

    Anton: Makes sense!
  • The distinction between "mirror (raid-1) device" and "high availability device" is a bit unclear. Neither the manual nor the UI provide any guidance as to when one should choose one over the other.

    Anton: You're correct. People always confuse them between each other.
  • The "About high availability devices" topic is empty in help

    Anton: We'll fix this!
  • Can I resize (HA) targets after creation?

    Anton: You cannot do it with V5.7 but we'll fix it with V5.8 for sure (adding extra LUN, it's not very easy to just grow the whole thing up).
  • The pdf's on the website should not be secured. I wanted to copy things like the MPIO hardware id's but found myself unable to do so.

    Anton: Do you mean you should be able to copy text & images from PDF?
  • Why do your write all 0 to the image file when initializing an image file? NTFS guarantees that a newly created file is already zeroed. Maybe the wizard is just unclear and I selected the wrong options. This problem occurs with the default options.

    Anton: NTFS like any other modern file system uses COW (Copy-on-Write) so file allocation happens delayed. If we would not touch content initially any huge
    write would be DOG slow resulting iSCSI connection dropped and write failed on initiator side as a result of timeout. But indeed it's an user-controllable setting and you can turn toggle it from ON to OFF if you wish.
  • The service startup should not be set as "Automatic (delayed)" but instead use the service notification functionality in Vista/7/2008 so the service is started as soon as the network stack is ready. See http://msdn.microsoft.com/en-us/library/dd405512.aspx. If you don't want to do this, just start the service and wait then for the availability of the network stack.

    Anton: We want to be backward compatible with other OSes we run on top of (for example Windows 2003 Server R2, still widely used for experiments). But we'd consider pointed stuff as an option.
  • The management console should save the layout/splitter positions.

    Anton: True!
  • The eventlog should be sorted by time, newest first. The windows eventlog is sorted the same way.

    Anton: I think we'll make them tunable...
  • Usability issue: It is non-obvious that syncing HA nodes works with the normal iSCSI connections. I disconnected the target and re-added it (to add it as a favourite target). Disconnecting the target broke the sync. It would be helpful to
    a) Tell the user that iSCSI connections via initiater is used by StarWind itself
    b) Add targets as favorites by default

    Anton: You're correct here! We'll fix this!
  • All the PDFs explaining Microsoft Failover Cluster things on the website: Bringing the disk online is not necessary. It is enough to initialize and format the disk on one node. Bringing the disk online on other nodes will corrupt NTFS and means that you have to reformat the disk again after bringing it online on all nodes.

    Anton: OK, accepted!
  • The selected performance counter should stay selected when switching between servers. I'd like to compare the performance of two servers - this is cumbersome to do with the current UI.

    Anton: Accepted!
  • An overview page of all important performance counters combined on a single screen (not necessarily in a single graph) would be nice. Some sort of a performance dashboard. Something like the first page of the "Resource Monitor" in Windows.

    Anton: OK!
  • In stats: "mbps"? Is that mb/s or megabits/s? On the disks? Over the network? Is sync traffic included?

    Anton: Crazy thing :( We'll fix this to Mbyte(s)/sec or GByte(s)/sec to avoid confusion.
  • In stats: The most recent data point is always 0. Hide it.

    Anton: True.
  • On HA: Your current model of HA is broken. There's *NO* reliable way for HA failover with a two-node cluster. Adding additional, redundant links just makes the problem less obvious. You need a third witness node, a quorum or only do failover on shutdowns. Automatic failover otherwise *WILL* produce a split brain scenario under some circumstances. Is there a way to configure StarWind to only failover when the primary node is shut down in a controlled way? And is there way to do manual failover?
Anton: NO IT IS NOT BROKEN! What you say was very true for StarWind versions 5.4 and prior. We did indeed rely on sync channel redundancy. But with V5.5 we're represented special patent-pending
algorithm allowing us to pick up "primary" head if all links are gone. So no brain split issue with StarWind. You're asking to bring third node but it cannot be done the way you tell - it's unprotected so
represents single point of failure. You're however 100% right on three node design: it's like RAID5 Vs. RAID6: three EQUAL nodes do provide better protection (you're still redundant with one node down)
and still have write-back cache enabled (still running at the full speed with one storage node of the cluster went AWOL). You'd see triple-node HA from StarWind Software pretty soon.


All those things aside, StarWind is a great product! Can't wait for the release with more than two HA nodes.

Anton: We're too :)

Cheers,
Simon
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
deiruch
Posts: 35
Joined: Wed May 25, 2011 12:16 pm

Sun May 29, 2011 10:56 pm

Wow, impressive amount of accepted feedback. I'm surprised :)
  • Get rid of the skins. Makes your product look like a toy.

    Anton: Question of taste entirely... I love skins (and so do many people). And theme / skinning engine helps us to provide custom version for a lot of companies. We would consider DISABLING them "by default" however.

    Simon: I do know that the "enthusiast" community likes things like that. Even I like skins (especially as I'm a ex-BeOS user). But no other "enterprise" software does this. It's as if the Exchange Management Console had skins... It's more a marketing/positioning issue. But it's not a big thing especially since the default skin is looking quite normal.
  • The pdf's on the website should not be secured. I wanted to copy things like the MPIO hardware id's but found myself unable to do so.

    Anton: Do you mean you should be able to copy text & images from PDF?

    Simon: Just text. I wanted to copy "MSFT2005iSCSIBusType_0x9" for example.
  • Why do your write all 0 to the image file when initializing an image file? NTFS guarantees that a newly created file is already zeroed. Maybe the wizard is just unclear and I selected the wrong options. This problem occurs with the default options.

    Anton: NTFS like any other modern file system uses COW (Copy-on-Write) so file allocation happens delayed. If we would not touch content initially any huge
    write would be DOG slow resulting iSCSI connection dropped and write failed on initiator side as a result of timeout. But indeed it's an user-controllable setting and you can turn toggle it from ON to OFF if you wish.

    Simon: Sounds very reasonable.
  • On HA: Your current model of HA is broken. There's *NO* reliable way for HA failover with a two-node cluster. Adding additional, redundant links just makes the problem less obvious. You need a third witness node, a quorum or only do failover on shutdowns. Automatic failover otherwise *WILL* produce a split brain scenario under some circumstances. Is there a way to configure StarWind to only failover when the primary node is shut down in a controlled way? And is there way to do manual failover?

    Anton: NO IT IS NOT BROKEN! What you say was very true for StarWind versions 5.4 and prior. We did indeed rely on sync channel redundancy. But with V5.5 we're represented special patent-pending
    algorithm allowing us to pick up "primary" head if all links are gone. So no brain split issue with StarWind. You're asking to bring third node but it cannot be done the way you tell - it's unprotected so
    represents single point of failure. You're however 100% right on three node design: it's like RAID5 Vs. RAID6: three EQUAL nodes do provide better protection (you're still redundant with one node down)
    and still have write-back cache enabled (still running at the full speed with one storage node of the cluster went AWOL). You'd see triple-node HA from StarWind Software pretty soon.

    Simon: Ok. Then how can the secondary node distinguish between a broken link to primary (A) and a dead primary server (B)? In case A it shouldn't accept writes, in case B the cluster should failover and the secondary node should accept writes. I can't see how you could distinguish between those two problems.

    Using quorum needs at least 3 voters (in the form of witness disks, nodes or any other resource). With any two votes the cluster can still work, so any one of the three devices can fail and there's no single point of failure. As long as there's a majority of the nodes/witnesses online it's safe to perform writes.

    I'm honestly a bit surprised by your answer since I have yet to hear of a model that works with two nodes soleyly.
Clustering with multiple nodes is unbelievably difficult to get right with proper performance. At least it is for me. Just read the DRBD papers yesterday again... :)

Cheers,
Simon
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon May 30, 2011 4:35 pm

To say truth most of the things you've mentioned are already inside our bug tracker :)

1) Skins. Again, question of taste. Enterprise storage uses customized Web-based GUIs mostly so they are skinned as hell :) We're at least ideologically closer to EQL and LeftHand then to Symantec and maybe Microsoft.

2) PDF. Clear. Would be fixed.

3) NTFS & COW. Hope that helped. BTW, we're slightly moving away from RAW images to own format for deduplicated data. It also uses COW inside itself so "fill with zeros Vs. non-fill with zeros" question should be obsolete quite soon.

4) HA. Again I cannot help with reverse-engineering of StarWind solution on my own public forum. Want things checked? Give them a try! Create HA cluster, configure heartbeat, brake all sync links and try to catch "brain split" issue.

( I could also don't understand how gravity works but it does not stop working b/c I don't understand the way it works )

5) Very true. 3 node cluster is 100 times more difficult to keep in sync and running proper IOps then two node one. Guess why most of the companies are stuck with two nodes once and forever?

P.S. DRBD sucks. Hate "one way" tickets DRBD itself is.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
deiruch
Posts: 35
Joined: Wed May 25, 2011 12:16 pm

Tue May 31, 2011 5:34 pm

anton (staff) wrote:4) HA. Again I cannot help with reverse-engineering of StarWind solution on my own public forum. Want things checked? Give them a try! Create HA cluster, configure heartbeat, brake all sync links and try to catch "brain split" issue.
Been there, done that.

Setup:
Two servers connected over two networks A and B. A is used to manage the servers (no StarWind or iSCSI traffic there), B is used for Sync. Each server connects to its own StarWind instance via iSCSI.

What I did:
1. Created a 20 GB target. Let it sync, ...
2. Broke network B so no more traffic goes through
3. Brought the target online on the first server, copied some folder to the target, took it offline again
4. Brought the target online on the second server, created a new text file on the target, took it offline again
5. Fixed network B
6. Waited for some seconds until StarWind told me that everything's in sync again
7. Brought the target online on the first server again.

Result:
All data of step #3 was gone, only the file of step #4 was there.

Expected result:
1. Only a single node should accept writes after step #2
2. Resynching should notify me about the split brain issue at step #6

UPDATE: I just saw that you wrote "heartbeat". I'm testing this too right now. Give me some minutes. Despite that even without heartbeat StarWind should avoid the split brain configuration.

UPDATE 2: The very same problem occurs even with heartbeats enabled when you disconnect both the sync and the heartbeat network.
anton (staff) wrote:( I could also don't understand how gravity works but it does not stop working b/c I don't understand the way it works )
I just believe it's logically impossible. Not gravity, but a two node HA solution. :wink:
anton (staff) wrote:P.S. DRBD sucks. Hate "one way" tickets DRBD itself is.
Never used it in practice - I just read the papers. From a theoretical standpoint the solution is not too bad IMHO. But who cares anyway... I use StarWind! :D
hixont
Posts: 25
Joined: Fri Jun 25, 2010 9:12 pm

Tue May 31, 2011 6:58 pm

The only time I have a problem with the split brain issue with a two node 5.6 HA configuration was because of misconfigured links. I had my sync channel on the same physical NIC as the hearbeat channel. This was an oversight when I moved the sync channel from a 1GB NIC to the 10GB NIC. I forgot to move the heartbeat from 10 GB NIC onto a different NIC. So when the 10 GB NIC failed it took the sync and heartbeat channels offline at the same time.

Once I reconfigured the channels I can take the either physical NIC offline (on purpose or accidently) and the SAN keeps working without data loss.
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jun 02, 2011 10:52 am

Hold on for a second... We've represented special mechanism to avoid brain split issue and you're not using it and at the same time you're complaining StarWind should not suffer from these evil even with heartbeat turned OFF?!? How do you see this working? It's like complaining to Nissan their GT-R should break fine with brake system removed from the car :) What should these poor little souls use in such a case? Parachutes? Anti-gravity? Voodoo magic? Communism theory? :)

Kidding... We'll make properly configured heartbeat as a mandatory setting for HA. To avoid people not using it and telling something went wrong. Thank you again for extra confirmation we're on the right lane :)

Second case works as expected. If both sync network and heartbeat network are gone there's no way for us to know who's alive. I have a question to you but I'll ask it via e-mail if you don't mind as it could impact quite a lot asked in public.

Thank you!
deiruch wrote:
anton (staff) wrote:4) HA. Again I cannot help with reverse-engineering of StarWind solution on my own public forum. Want things checked? Give them a try! Create HA cluster, configure heartbeat, brake all sync links and try to catch "brain split" issue.
Been there, done that.

Setup:
Two servers connected over two networks A and B. A is used to manage the servers (no StarWind or iSCSI traffic there), B is used for Sync. Each server connects to its own StarWind instance via iSCSI.

What I did:
1. Created a 20 GB target. Let it sync, ...
2. Broke network B so no more traffic goes through
3. Brought the target online on the first server, copied some folder to the target, took it offline again
4. Brought the target online on the second server, created a new text file on the target, took it offline again
5. Fixed network B
6. Waited for some seconds until StarWind told me that everything's in sync again
7. Brought the target online on the first server again.

Result:
All data of step #3 was gone, only the file of step #4 was there.

Expected result:
1. Only a single node should accept writes after step #2
2. Resynching should notify me about the split brain issue at step #6

UPDATE: I just saw that you wrote "heartbeat". I'm testing this too right now. Give me some minutes. Despite that even without heartbeat StarWind should avoid the split brain configuration.

UPDATE 2: The very same problem occurs even with heartbeats enabled when you disconnect both the sync and the heartbeat network.
anton (staff) wrote:( I could also don't understand how gravity works but it does not stop working b/c I don't understand the way it works )
I just believe it's logically impossible. Not gravity, but a two node HA solution. :wink:
anton (staff) wrote:P.S. DRBD sucks. Hate "one way" tickets DRBD itself is.
Never used it in practice - I just read the papers. From a theoretical standpoint the solution is not too bad IMHO. But who cares anyway... I use StarWind! :D
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jun 02, 2011 10:56 am

You absolutely need to have heartbeat and sync channels utilizing different hardware. H/B is not resource critical so even dedicated 1 GbE link should be fine.

P.S. We actually use ALL existing links for H/B so at least one of them should be alive.
hixont wrote:The only time I have a problem with the split brain issue with a two node 5.6 HA configuration was because of misconfigured links. I had my sync channel on the same physical NIC as the hearbeat channel. This was an oversight when I moved the sync channel from a 1GB NIC to the 10GB NIC. I forgot to move the heartbeat from 10 GB NIC onto a different NIC. So when the 10 GB NIC failed it took the sync and heartbeat channels offline at the same time.

Once I reconfigured the channels I can take the either physical NIC offline (on purpose or accidently) and the SAN keeps working without data loss.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
deiruch
Posts: 35
Joined: Wed May 25, 2011 12:16 pm

Thu Jun 02, 2011 4:46 pm

Voodo magic sounds mighty fine! :D

Seriously: The fact that a split-brain configuration is possible at all is not a good sign.

I tested single- and double-link configurations because the current beta allowed me both things. I'd expect some people to have a single "single point of failure" in both networks. It's especially obvious when you sync over the internet (VPN) and the internet connection goes down on one end. Both links going down at the same time is something you have to live with and plan for.

Got your mail, thanks!

Cheers,
Simon
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 03, 2011 10:36 am

Absolutely love voodoo thing! Helps with no remote debugger attached and core dumped somewhere in the middle of Pacific :)

Hold on... There's no way to make 100% unbreakable system. We can provide system surviving fault, double-fault, triple fault but we cannot make it infinite-fault tolerant.
With multiple sync channels, multiple nodes, multiple heartbeat routes (keep in mind - routing H/B traffic thru client connections make whole system nearly completely immune to S/B issue)
decreases chances to zero. But to avoid referenced issue completely the only way is to have non-symmetric roles in a cluster. Something which also suck but in a very different way :)

Yes, so our task is to make people avoid configuring bad things. And we'll definitely do it.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
DavidMcKnight
Posts: 39
Joined: Mon Sep 06, 2010 2:59 pm

Mon Jun 06, 2011 3:20 pm

Here's a couple comments for Starwind's GUI.

I've been having a real problem with the performance of my datastores. So I hoped I could use the new performance data tools in Starwind to help. First, and I think it's already been mentioned, the "Total Bandwidth" chart needs better labels and I don't understand the numbers "Latest 0,01" what is a "0,01" value? But I what I really need is real time values. When I'm troubleshooting/testing I need to know what is going on right now. For numbers that have a "Per Second" value how about an option for the chart to update once a second. Also what's missing is the data that helps me figure out where the bottle neck is. I treat my datastores like race cars. I try to squeeze everything I can out of them. So I'm constantly tweaking them. But, from inside of Starwind, I can't tell what component (network, hard drives, CPU, etc.) needs tweaking the most.

Also when I click on a target and then my iSCSI Sessions. As I understand it, and iSCSI session is negotiated. So if I click on a particular session I could get to details for that session (Initial R2T, Firsr Burst Length, Max Burst Length, etc.). So I could make sure I have the clients and servers are configured correctly. I can never have too much information. I'd like to see all the session settings the client (VMware Host) wanted to have, all the settings Starwind wanted to use, and what settings were agreed upon.

There are several threads in the forums describing how to get a little bit more from Starwind (GlobalMaxTcpWindowSize, TcpWindowSize, Tcp1323Opts, SackOpts, etc.). Most involve digging around in the registry . There really should be a way to apply those setting from inside of Starwind's GUI.

Lastly, I'm curious, the Starwind Service it's self. I see it's set to Delayed Start. Why is that? I mean, when I start up my iSCSI Server, I want it up and running right away. I find myself rebooting the server, for whatever reason, then once I can logon, I got to services and start the service myself, not wanting to wait 2 or 3 minutes for the service to start as it is currently configured. So why is it set to delayed, what exactly is it waiting on?
User avatar
Aitor_Ibarra
Posts: 163
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Tue Jun 07, 2011 12:51 pm

Just responding to David about the delayed start thing...

Firstly, what I'm saying here is opinion, not fact - I could be very wrong!

Starwind runs on top of Windows and thus has dependencies on other Windows services. These in turn have dependencies on other services. And Windows is a moving target (e.g. Windows Update). It's probably not realistic for Starwind to constantly stay on top of changes in Windows and alter the service startup dependencies. So, it's much safer to set to Delayed Start. This doesn't mean "wait ten minutes after the system has booted" - Delayed Start services should be started as soon as all the Automatic services have started. This means that everything that Starwind needs will definitely be running when it starts, moreover, as services start pretty quickly, it's unlikely that you'd see much improvement if the Starwind service was Automatic with dependencies set. Obviously, this is one of the reasons why it's a good idea to run Starwind on a dedicated windows installation, so that you aren't competing with other services starting up at boot (e.g. SQL, Exchange, Active Directory).
gstephenson
Posts: 31
Joined: Tue Feb 22, 2011 10:09 pm

Tue Jun 07, 2011 8:59 pm

anton (staff) wrote:You absolutely need to have heartbeat and sync channels utilizing different hardware. H/B is not resource critical so even dedicated 1 GbE link should be fine.

P.S. We actually use ALL existing links for H/B so at least one of them should be alive.
hixont wrote:The only time I have a problem with the split brain issue with a two node 5.6 HA configuration was because of misconfigured links. I had my sync channel on the same physical NIC as the hearbeat channel. This was an oversight when I moved the sync channel from a 1GB NIC to the 10GB NIC. I forgot to move the heartbeat from 10 GB NIC onto a different NIC. So when the 10 GB NIC failed it took the sync and heartbeat channels offline at the same time.

Once I reconfigured the channels I can take the either physical NIC offline (on purpose or accidently) and the SAN keeps working without data loss.
Ok, now I'm worried! I am running v5.6 on two servers, and have 2 NICs in each server, and 3 HA targets hosting Hyper-v VMs in a cluster between the two servers.

First NIC has two 10Gb ports and two 1Gb ports. Three of the ports have crossover cables between my two servers.
One 10Gb port is the sync, and one 1Gb port is heartbeat. Second 10Gb port also has a crossover.

Second NIC has two ports, each connected to 100Mb/s LAN for client connections.

Since heartbeat and sync are on the same NIC, will I get the brain split issue or will I be saved by Starwind using all networks for heartbeat? I'm wondering about the scenario where I need to reboot one server to apply OS patches? ...But even if H/B was on a different NIC than sync, rebooting the server still takes both of them down, right? So then what?
deiruch
Posts: 35
Joined: Wed May 25, 2011 12:16 pm

Wed Jun 08, 2011 10:54 am

gstephenson wrote:Since heartbeat and sync are on the same NIC, will I get the brain split issue or will I be saved by Starwind using all networks for heartbeat? I'm wondering about the scenario where I need to reboot one server to apply OS patches? ...But even if H/B was on a different NIC than sync, rebooting the server still takes both of them down, right? So then what?
Don't be worried. A split brain scenario requires two working servers. It's only a problem if you have two running servers, initiators can connect to these servers and the sync & heartbeat channels are dead - all at the same time.
hixont
Posts: 25
Joined: Fri Jun 25, 2010 9:12 pm

Wed Jun 08, 2011 4:01 pm

In my scenario the NIC with the HB and Sync channels went offline, but both of my servers were still live and "thinking" they were now the only survivor of a failed pair.
Post Reply