Storage Performance Degradation

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

scourtney2000
Posts: 5
Joined: Thu Jul 23, 2015 12:59 pm

Thu Jul 23, 2015 1:02 pm

I get a ton of these errors on my Starwind HA Device.

My setup is two Starwind servers and two clustered Hyper-V servers hosting VM's. I do NOT have jumbo frames enabled. The errors occur during a write operation. I am trying to restore the logs to a SQL DB.

High Availability Device iqn.2008-08.com.starwindsoftware:redstripe-logs, command "0x1a" execution time is 11422 ms (storage performance degradation)
scourtney2000
Posts: 5
Joined: Thu Jul 23, 2015 12:59 pm

Thu Jul 23, 2015 1:02 pm

I forgot to mention I get thousands of these messages.
scourtney2000
Posts: 5
Joined: Thu Jul 23, 2015 12:59 pm

Thu Jul 23, 2015 3:08 pm

UPDATE:

I am also noticing this error:

High Availability Device iqn.2008-08.com.starwindsoftware:kalik-logs, synchronization Connection IP 10.254.4.102 with Partner Node iqn.2008-08.com.starwindsoftware:redstripe-logs lost

My two starwind servers have 4 copper 10Gbe links that I connect to each other for synco. I do not have a switch I just cross connect them to each other. I also have a 1Gbe for Heartbeat. I am noticing that occasionally these links drops. Any of the 5 links drop randomly, then connect 5-6 seconds later.
scourtney2000
Posts: 5
Joined: Thu Jul 23, 2015 12:59 pm

Thu Jul 23, 2015 4:00 pm

Also I am using LSFS HA for the purpose of Hyper-V Failover Cluster. The problem really begins in earnest when I start to restore a SQL DB in a Server 2008 VM.
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Fri Jul 24, 2015 8:34 am

Hi scourtney2000,

Storage Performance Degradation is an old issue. I've also had it in my production environment on a previous build. Usually, it is connected with L1 cache on older builds, so you should make a tough decision to update your current StarWind version (if it is not the latest one) or disabling L1 cache on these devices.

The other reason could be an underlying raid storage problem. You can seek for log events regarding RAID issues in you event log whether there are any or not :(
Vladislav (Staff)
Staff
Posts: 180
Joined: Fri Feb 27, 2015 4:31 pm

Mon Jul 27, 2015 3:46 pm

I can confirm both of statements by darklight.

Could you provide us with an update? Did you have a chance to try darklight's recommendations?
scourtney2000
Posts: 5
Joined: Thu Jul 23, 2015 12:59 pm

Thu Jul 30, 2015 11:09 pm

I have not tried eliminating the L1 Cache, because I have the latest version.

My RAID does not appear to generating any errors.

What is strange is this issue only manifests itself so far during the restoration of a SQL DB.
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Wed Aug 05, 2015 5:15 pm

Why not having jumbo frames enabled?

https://knowledgebase.starwindsoftware. ... important/

Symptom 1... maybe that is the case?
Tarass (Staff)

Tue Aug 11, 2015 10:50 am

Hi everybody, any updates on that case?
User avatar
Branin
Posts: 31
Joined: Mon May 04, 2015 5:22 pm

Mon Sep 21, 2015 11:14 am

I can confirm that I also have the same problem (not thousands though, just a few every few days) on both nodes of my Hyperconverged 2-node setup (with 2 10GbE links cross-connected for iSCSI and heartbeat). I have the latest version, with only L1 cache (as I had some severe corruption problems with the L2 cache enabled). Flat files.

I also get VSS errors from my Veeam backup application that I assume are connected.

Branin
Rajesh.Rangarajan
Posts: 6
Joined: Tue Jun 23, 2015 4:43 pm

Wed Sep 23, 2015 9:54 am

You can try disabling L1 cache - it solved storage performance degradation issue for me. Good news - new build should be released soon where they will add some L1 cache improvements and tweaks :)

Also, I would recommend using L2 cache ONLY in "wright-through" mode as it will protect you from any data corruption if SSD drive fails.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Sep 25, 2015 1:33 pm

With a very fast all-flash back-end RAM used to adsorb HUGE amount of writes still does no good. Hardware RAID controller guys have same issue so far.

Making long story short: every setup should be fine-grain-tuned by StarWind engineers. We're changing support and installation policy to match that.

Also you right we're boosting cache performance and upcoming version will have great improvements. One currently developed has 4x better numbers FYI.
Rajesh.Rangarajan wrote:You can try disabling L1 cache - it solved storage performance degradation issue for me. Good news - new build should be released soon where they will add some L1 cache improvements and tweaks :)

Also, I would recommend using L2 cache ONLY in "wright-through" mode as it will protect you from any data corruption if SSD drive fails.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
boeiend
Posts: 4
Joined: Fri Apr 24, 2015 11:28 am

Sun Nov 01, 2015 8:27 pm

Rajesh.Rangarajan wrote:You can try disabling L1 cache - it solved storage performance degradation issue for me. Good news - new build should be released soon where they will add some L1 cache improvements and tweaks :)

Also, I would recommend using L2 cache ONLY in "wright-through" mode as it will protect you from any data corruption if SSD drive fails.
Any time schedule for this new release? I had the exact same problem, disabling L1 cache solved it for me as well...
Vladislav (Staff)
Staff
Posts: 180
Joined: Fri Feb 27, 2015 4:31 pm

Thu Nov 05, 2015 5:06 pm

Hello,

New StarWind build will be available by the end of this week.
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Thu Nov 05, 2015 9:03 pm

Yeah, all of us here heard that already... https://forums.starwindsoftware.com/vie ... =30#p24707

... over a month ago :(
Post Reply