Storage Performance Degradation

scourtney2000 · Thu Jul 23, 2015 1:02 pm

I get a ton of these errors on my Starwind HA Device.

My setup is two Starwind servers and two clustered Hyper-V servers hosting VM's. I do NOT have jumbo frames enabled. The errors occur during a write operation. I am trying to restore the logs to a SQL DB.

High Availability Device iqn.2008-08.com.starwindsoftware:redstripe-logs, command "0x1a" execution time is 11422 ms (storage performance degradation)

scourtney2000 · Thu Jul 23, 2015 1:02 pm

I forgot to mention I get thousands of these messages.

scourtney2000 · Thu Jul 23, 2015 3:08 pm

UPDATE:

I am also noticing this error:

High Availability Device iqn.2008-08.com.starwindsoftware:kalik-logs, synchronization Connection IP 10.254.4.102 with Partner Node iqn.2008-08.com.starwindsoftware:redstripe-logs lost

My two starwind servers have 4 copper 10Gbe links that I connect to each other for synco. I do not have a switch I just cross connect them to each other. I also have a 1Gbe for Heartbeat. I am noticing that occasionally these links drops. Any of the 5 links drop randomly, then connect 5-6 seconds later.

scourtney2000 · Thu Jul 23, 2015 4:00 pm

Also I am using LSFS HA for the purpose of Hyper-V Failover Cluster. The problem really begins in earnest when I start to restore a SQL DB in a Server 2008 VM.

darklight · Fri Jul 24, 2015 8:34 am

Hi scourtney2000,

Storage Performance Degradation is an old issue. I've also had it in my production environment on a previous build. Usually, it is connected with L1 cache on older builds, so you should make a tough decision to update your current StarWind version (if it is not the latest one) or disabling L1 cache on these devices.

The other reason could be an underlying raid storage problem. You can seek for log events regarding RAID issues in you event log whether there are any or not

Mon Jul 27, 2015 3:46 pm

I can confirm both of statements by darklight.

Could you provide us with an update? Did you have a chance to try darklight's recommendations?

scourtney2000 · Thu Jul 30, 2015 11:09 pm

I have not tried eliminating the L1 Cache, because I have the latest version.

My RAID does not appear to generating any errors.

What is strange is this issue only manifests itself so far during the restoration of a SQL DB.

darklight · Wed Aug 05, 2015 5:15 pm

Why not having jumbo frames enabled?

https://knowledgebase.starwindsoftware. ... important/

Symptom 1... maybe that is the case?

Tarass (Staff) · Tue Aug 11, 2015 10:50 am

Hi everybody, any updates on that case?

Branin · Mon Sep 21, 2015 11:14 am

I can confirm that I also have the same problem (not thousands though, just a few every few days) on both nodes of my Hyperconverged 2-node setup (with 2 10GbE links cross-connected for iSCSI and heartbeat). I have the latest version, with only L1 cache (as I had some severe corruption problems with the L2 cache enabled). Flat files.

I also get VSS errors from my Veeam backup application that I assume are connected.

Branin

Rajesh.Rangarajan · Wed Sep 23, 2015 9:54 am

You can try disabling L1 cache - it solved storage performance degradation issue for me. Good news - new build should be released soon where they will add some L1 cache improvements and tweaks

Also, I would recommend using L2 cache ONLY in "wright-through" mode as it will protect you from any data corruption if SSD drive fails.

Fri Sep 25, 2015 1:33 pm

With a very fast all-flash back-end RAM used to adsorb HUGE amount of writes still does no good. Hardware RAID controller guys have same issue so far.

Making long story short: every setup should be fine-grain-tuned by StarWind engineers. We're changing support and installation policy to match that.

Also you right we're boosting cache performance and upcoming version will have great improvements. One currently developed has 4x better numbers FYI.

Rajesh.Rangarajan wrote:You can try disabling L1 cache - it solved storage performance degradation issue for me. Good news - new build should be released soon where they will add some L1 cache improvements and tweaks

Also, I would recommend using L2 cache ONLY in "wright-through" mode as it will protect you from any data corruption if SSD drive fails.

boeiend · Sun Nov 01, 2015 8:27 pm

Rajesh.Rangarajan wrote:You can try disabling L1 cache - it solved storage performance degradation issue for me. Good news - new build should be released soon where they will add some L1 cache improvements and tweaks

Also, I would recommend using L2 cache ONLY in "wright-through" mode as it will protect you from any data corruption if SSD drive fails.

Any time schedule for this new release? I had the exact same problem, disabling L1 cache solved it for me as well...

Thu Nov 05, 2015 5:06 pm

Hello,

New StarWind build will be available by the end of this week.

darklight · Thu Nov 05, 2015 9:03 pm

Yeah, all of us here heard that already... https://forums.starwindsoftware.com/vie ... =30#p24707

... over a month ago