What to expect from free version

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Anatoly (staff), Max (staff)

Post Reply
thirdbird
Posts: 9
Joined: Tue Feb 14, 2017 8:21 am

Mon Jan 08, 2018 1:08 pm

I made 2 LUNs (2xtarget, 2ximages), initiated them with mpio and set them up as MSCS quorum witness (1 GB) and Hyper-V VM role storage (128 GB) . 2 servers = 2 nodes. 1 NIC for everything per server (lab test) on a single switch. I simulate failovers by simply pulling individual NIC's creating a full disconnect. Failover works, but I've had mixed results with WB (massively corrupted VM). Would like to know the self-repair expectations before trying another round of tests tonight with WT disks. I'm not sure the problems during testing have been entirely WB related, but I'm hoping so.

1. Should I expect self-repair after a node comes back online and they are both online?
Because I'm doing failover tests and the VM gets more and more corrupted until it doesn't even start anymore. This was with WB, gonna try WT tonight. I'm just wondering if I need to do something actively with specific steps when a node comes back online or if I can expect the services to fix it automagically as long as both nodes are alive and can talk together. They should know who has latest data and keep data consistent, right? The most silly thing I had was a node going offline and the LUN went into failed state, making cluster pointless and fragile.

Redoing all tests tonight with WT and would like to know what to expect from self-repair in the free version so I can simulate failovers with the correct expectations in mind. I'm hoping to rely on services to take care of its own health, as long as they can speak together.

I need the cluster to be able to fail over without corruption or any manual intervention if a node goes offline - any node. And I'd like the cluster to entirely fix itself when the node comes back online. Effectively preparing it for another failover if needed. I can't have a node set itself as failed just because one disappears, that's the whole point of HA.
thirdbird
Posts: 9
Joined: Tue Feb 14, 2017 8:21 am

Mon Jan 08, 2018 8:06 pm

It was obviously all about WB.

Did a lot of testing tonight and when using WT I had no corruptions at all, not a single one. There were just some "indexes for file 0x9" repairs that I google to be nothing of importance. I've tested single node disconnects, shutdowns, instant power off's, and starting the outdated node first back up again.

For all the initial tests the VM came back after 2-15 minutes (usually closer to the former). Starting the outdated node first however after a instant power off on both, halted everything until I had started a resync check (no GUI, using PS scripts). That got everything back online within 3 minutes. When shut down gently at exactly the same time (as much as it's possible) they fixed themselves at startup every single time. There's only 1 requirement here, that's knowing which node has the updated information. Otherwise I seem pretty safeguarded as long as there's a UPS up.

Even more remarkably, when I gently shut down the first, then wrote lots of data and transferred lots of files to the last one before shutting that one gently off as well, when starting the strongly outdated node first until the cluster manager had detected it, then the updated one 1-2 min later to try annoy the cluster and VSAN on purpose, they completely self-healed. I was pretty impressed by that.

Unrelated: I found it weird that I had slightly faster write speeds on WT than using WB. Can obviously not complain about that, just did not expect it. This is a low spec system though.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Jan 09, 2018 5:05 am

This isn't how things work... You pick up either of the strategies:

1) Reliable heartbeat networks
2) Hardcoded "primary" node
3) Witness-based quorum

So if you plan to cut-off all the networks between the nodes AND you don't do "segregated" model (compute and storage nodes are separated) so you can't route extra heartbeat channel over incoming client connectivity network - you have to use (3).

Yes, with a transactional log OR no write back cache OR log-structured file system you expect to have automatic repair.
thirdbird wrote:I made 2 LUNs (2xtarget, 2ximages), initiated them with mpio and set them up as MSCS quorum witness (1 GB) and Hyper-V VM role storage (128 GB) . 2 servers = 2 nodes. 1 NIC for everything per server (lab test) on a single switch. I simulate failovers by simply pulling individual NIC's creating a full disconnect. Failover works, but I've had mixed results with WB (massively corrupted VM). Would like to know the self-repair expectations before trying another round of tests tonight with WT disks. I'm not sure the problems during testing have been entirely WB related, but I'm hoping so.

1. Should I expect self-repair after a node comes back online and they are both online?
Because I'm doing failover tests and the VM gets more and more corrupted until it doesn't even start anymore. This was with WB, gonna try WT tonight. I'm just wondering if I need to do something actively with specific steps when a node comes back online or if I can expect the services to fix it automagically as long as both nodes are alive and can talk together. They should know who has latest data and keep data consistent, right? The most silly thing I had was a node going offline and the LUN went into failed state, making cluster pointless and fragile.

Redoing all tests tonight with WT and would like to know what to expect from self-repair in the free version so I can simulate failovers with the correct expectations in mind. I'm hoping to rely on services to take care of its own health, as long as they can speak together.

I need the cluster to be able to fail over without corruption or any manual intervention if a node goes offline - any node. And I'd like the cluster to entirely fix itself when the node comes back online. Effectively preparing it for another failover if needed. I can't have a node set itself as failed just because one disappears, that's the whole point of HA.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Jan 09, 2018 5:08 am

No it's not.
thirdbird wrote:It was obviously all about WB.

Did a lot of testing tonight and when using WT I had no corruptions at all, not a single one. There were just some "indexes for file 0x9" repairs that I google to be nothing of importance. I've tested single node disconnects, shutdowns, instant power off's, and starting the outdated node first back up again.

For all the initial tests the VM came back after 2-15 minutes (usually closer to the former). Starting the outdated node first however after a instant power off on both, halted everything until I had started a resync check (no GUI, using PS scripts). That got everything back online within 3 minutes. When shut down gently at exactly the same time (as much as it's possible) they fixed themselves at startup every single time. There's only 1 requirement here, that's knowing which node has the updated information. Otherwise I seem pretty safeguarded as long as there's a UPS up.

Even more remarkably, when I gently shut down the first, then wrote lots of data and transferred lots of files to the last one before shutting that one gently off as well, when starting the strongly outdated node first until the cluster manager had detected it, then the updated one 1-2 min later to try annoy the cluster and VSAN on purpose, they completely self-healed. I was pretty impressed by that.

Unrelated: I found it weird that I had slightly faster write speeds on WT than using WB. Can obviously not complain about that, just did not expect it. This is a low spec system though.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
thirdbird
Posts: 9
Joined: Tue Feb 14, 2017 8:21 am

Tue Jan 09, 2018 8:14 am

Your defensive support strategy is ridiculous. WB absolutely destroyed my testing the first time, hands down, it's what happened, no question about it - IN THAT SCENARIO, just say yes - in a bad scenario as that, WB was it. That's how it worked, no matter how you try to politic your way around it. This defensive wrong use kind of support is arrogant and straight up repulsive to even think about paying for it later if something goes wrong. Denying outcomes. WT ABSOLUTELY saved our second rounds of tests, and I applauded you for it, you should be positive towards me instead of being totally defensive about how hard I tested it.

1) Reliable heartbeat networks
Won't always be reliable in SMB with limited equipment, enough to a degree that it's interesting to test what happens when it's broken.

2) Hardcoded "primary" node
Cluster failover is dynamic in pri/sec roles. You seem to do this to a degree anyway by default waiting for a manual sync from the correct node when a reversed updated startup sequence happens. That's pretty good.

3) Witness-based quorum
I did not encounter any information about this, but I see how it can help. Only a single test scenario had use for it though, when powered off suddenly and starting the outdated node long before the updated one.

I easily see how at least one of these requirements + consistent UPS is needed if going with WB, but I don't dare that after testing WB. It was flat out dangerous without heavy safeguarding around it. WT actually performed remarkably well.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Jan 11, 2018 11:07 am

thirdbird,

From your description of the test scenario with WB, it looks like you used only one network interface for everything, which makes me think the same interface was used for Sync and for Heartbeat. If my suggestion is true, then in case of putting down this link you encountered the classical example of the split-brain issue. StarWind VSAN has been designed to work with at least 2 dedicated network links with Heartbeat failover strategy. In case you have only one link available, you need to choose Node Majority as failover strategy in your StarWind setup. Right now, Node Majority is available from GUI only. Script examples will be added a bit later on. Documentation on this is available at https://www.starwindsoftware.com/resour ... r-strategy
thirdbird
Posts: 9
Joined: Tue Feb 14, 2017 8:21 am

Thu Jan 11, 2018 2:13 pm

Boris (staff) wrote:thirdbird,

From your description of the test scenario with WB, it looks like you used only one network interface for everything, which makes me think the same interface was used for Sync and for Heartbeat. If my suggestion is true, then in case of putting down this link you encountered the classical example of the split-brain issue. StarWind VSAN has been designed to work with at least 2 dedicated network links with Heartbeat failover strategy. In case you have only one link available, you need to choose Node Majority as failover strategy in your StarWind setup. Right now, Node Majority is available from GUI only. Script examples will be added a bit later on. Documentation on this is available at https://www.starwindsoftware.com/resour ... r-strategy
Thank you, Boris.

Article states it doesn't work for 2 nodes (logically enough) but that a third witness would though I'm really wanting to just depend on the 2 nodes. I just really want to confirm if my results are working as intended when exposing it like I have to split brain (on purpose to see how it reacts). The only time my 2-node WT setup refused to go online and self-repair on its own, was when I instantly powered both nodes off at the same time, that was fixed by running the checkHaSyncState script on the correct node. In all other scenarios with gentle shutdowns, even at the same time, they self-repaired and MSCS got back online to read its witness and fail over my test VM without corruption or any other problems.

Is this because they may randomly have have had the required ~500ms time to figure out who the most updated node is before going down? I like research and thorough testing. I know 2-node is perhaps not optimal on its own but a WT setup like this may work for us if a UPS to differentiate shutdowns is all that's needed for automatic repair, we don't necessarily need cache on that level. If you say my results are working as intended, that would be very favorable before putting the first nail in a setup.

Anton already kind of stated it, so sorry if I'm repeating myself.
"OR no write back cache ... you expect to have automatic repair. "
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Jan 11, 2018 4:10 pm

First of all, the linked article deals with Node Majority failover strategy. For it to function properly, you need to have the odd number of nodes. Which is not your case. So, you need to stick to Heartbeat failover strategy. In its turn, this strategy implies using a dedicated heartbeat link. Which is not your case as well, as I get it :)
Earlier your mentioned it was not possible to ensure equipment stability in small business environments, but I would like to object to this. For heartbeat, you can use any standalone NIC link, be it even 100Mbit if no 1+ Gbit links are available. The idea is to have a redundant network connection, which does not depend on the Sync interface and thus will most likely stay operational even if the physical Sync NIC fails. This way, the nodes will still have a communication channel between them and one of the StarWind nodes will be marked as not synchronized automatically in case synchronization link is lost. This will definitely keep you safeguarded against split brain, as only one node will stay operational. This is true for any setup, with either WB or WT cache, or even with no caching configured for StarWind devices.
Anton already kind of stated it, so sorry if I'm repeating myself.
"OR no write back cache ... you expect to have automatic repair. "
I assume Anton meant that in case of power outage you will lose the data contained in the WB cache, which is obvious and completely expected behavior unless you've got yourself covered by using a UPS.
thirdbird
Posts: 9
Joined: Tue Feb 14, 2017 8:21 am

Thu Jan 11, 2018 5:46 pm

Thanks again for a fair reply, Boris. Any and all NIC's may fail, their host bus, the MB driving both, or the PSU. That's what I meant earlier about SMB. Sorry, this thread got a messy start.

I'd just like to confirm what a representative at spiceworks mentioned. That heartbeat only needs 100-500ms to make a pri/sec decision. I believe this is what made me able to properly shut both nodes down within seconds of eachother and have them auto recover at boot again. If so a PSU will go a long way with WT. I was left wondering to myself if it could really be that good.

I'm playing with horrible scenarios here, so please don't think I'm criticising any functionality. It's just abusive testing and curiosity at this stage and I don't expect anything special from it, I'm just trying to get it to show me its dark sides as well as the good ones, gladly with your input as confirmation if a result was arbitrary or intended.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Jan 12, 2018 1:15 pm

thirdbird,

I understand you are giving your instance of StarWind a hard time, just to be sure it copes with your worst scenarios imaginable. Yet, what I really meant was that driving a car with 4 wheels is much safer than driving the same car with only 3 wheels in place. :D The same applies to StarWind NICs - running StarWind with Synchronization and Heartbeat links is much safer than doing the same with a single NIC used for all purposes.
Post Reply