BUG: Incredibly high latency on HA luns in version 7509

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Thu Feb 12, 2015 9:30 pm

Is this error present in all v8 editions? Is a downgrade (prior v8-version) possible?
To make long story short - it is better to stick with the version that you have at the moment. Downgrading will still require service restart, which will cause the situatiion that you already had. We will have the update soon.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
User avatar
lohelle
Posts: 144
Joined: Sun Aug 28, 2011 2:04 pm

Wed Feb 18, 2015 8:08 pm

Any news? I don't like running on a single node..
cabi
Posts: 1
Joined: Mon Jan 12, 2015 2:31 am

Thu Feb 19, 2015 2:55 pm

We're still on v6.6399 and starting to look at v8, should we wait or consider v8 stable enough for production? Has this bug been fixed?
epalombizio
Posts: 67
Joined: Wed Oct 27, 2010 3:40 pm

Thu Feb 19, 2015 3:54 pm

I'd stick with what you have. So far, I haven't found a single version that hasn't had at least one major issue, you'll just need to decide what you can live with.

I'm finding that in the latest version of V8, when using HA, even with the workaround indicated in a previous post, latency can become unbearable on some LUNs. I went ahead and disabled HA on one of my targets and it crashed the service on my partner Starwind server. When I say unbearable, I'm talking latencies in the 30000ms range, even worse than my previous graphs.

In my opinion, HA is still not bulletproof as I've been able to crash the service on at least one of my partner nodes performing a basic operation within the confines of the product. Interestingly, when I test HA by cold booting an HA member, HA works as expected and is pretty reliable. In my latest case, I just removed a replica for an HA target using the UI, and the service crashed. Starwind VSAN is promising, but the basic features need to be bullet-proof. This is meant to be used in production. All of the other features are irrelevant if the basics don't work reliably.

There needs to be transparency when bugs are discovered and this info needs to be communicated to the community in some fashion. I've requested that a post with known issues be populated and maintained to help make our decisions easier as to whether to upgrade to a certain version, but this still hasn't happened yet. We need to know what features are production ready vs ones that are still considered beta.

I hope that Starwind begins to take this aspect more seriously as patience only goes so far.
MichelZ
Posts: 34
Joined: Sun Mar 16, 2014 10:38 am

Tue Feb 24, 2015 12:25 pm

Any news on this topic? When is a fixed version expected?
User avatar
lohelle
Posts: 144
Joined: Sun Aug 28, 2011 2:04 pm

Thu Feb 26, 2015 3:48 pm

Is this fixed now? This must be fixed NOW! We cannot run with single targets for weeks... Not good enough..
MichelZ
Posts: 34
Joined: Sun Mar 16, 2014 10:38 am

Fri Feb 27, 2015 8:42 am

There's nothing mentioned in the release notes from yesterday.... :(
I'm still installing it for the LSFS fixes. Can't get worse that it currently is, can it?
User avatar
lohelle
Posts: 144
Joined: Sun Aug 28, 2011 2:04 pm

Fri Feb 27, 2015 9:29 am

This is a "show-stopper" error, and should be prioritized before eating, sleeping and all other less important things...
MichelZ
Posts: 34
Joined: Sun Mar 16, 2014 10:38 am

Fri Feb 27, 2015 9:39 am

It is incredibly annoying, yes. Also on top of if, the full-syncs after an unclean shutdown eat away our SSD write-endurance :oops: (all-ssd storage)
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Fri Feb 27, 2015 5:24 pm

Gentlemen,
I'm really sorry to hear you're facing these issues.
We already addressed part of them in the build released yesterday, and we'll do 1 more update in 1-2 weeks specifically addressing caching improvements.
The storage performance degradation issue mentioned above seems to disappear after we addded the caching improvements to the code, however it needs to be thoroughly tested before we can let it into production environments.

I know all of you are currently in touch with our tech support and I'll make sure our engineers are getting on the same page with you as soon as they get an update from R&D.
Thank you for understanding!
Max Kolomyeytsev
StarWind Software
epalombizio
Posts: 67
Joined: Wed Oct 27, 2010 3:40 pm

Fri Feb 27, 2015 9:48 pm

Max - Thanks for the note.. Just so I'm clear, you are saying the new build can go into production environments? In my chats with support, they indicated they fixed the latency issue with the L1 cache, which is priority for me, so I'm eager to try it if you can confirm it's production ready.

Thanks
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Tue Mar 03, 2015 5:22 pm

We did fixed that, but the build with fixes is in QA dep right now, and it is on final testing stage.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
epalombizio
Posts: 67
Joined: Wed Oct 27, 2010 3:40 pm

Fri Mar 20, 2015 1:32 pm

Any new news on this? I still am affected by this.. Latency warnings from Starwind will show latencies in the 10 second range.

Thanks
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Mar 23, 2015 11:02 am

Hi guys!

It looks like we`ve isolated the issue, and we will upload build with fixes as soon as we will have it.

BTW, please let me know if anyne of you is interested to test the build on Beta stage. Just drop me quick PM (don`t forget to include link to this thread please)
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
epalombizio
Posts: 67
Joined: Wed Oct 27, 2010 3:40 pm

Mon Mar 23, 2015 2:30 pm

Anatoly,
I'm confused. I thought this was isolated almost a month ago. I just want to make sure I'm on the same page with Starwind here. In this post from February 27th, it's supposed to already be in QA:

"Gentlemen,
I'm really sorry to hear you're facing these issues.
We already addressed part of them in the build released yesterday, and we'll do 1 more update in 1-2 weeks specifically addressing caching improvements.
The storage performance degradation issue mentioned above seems to disappear after we addded the caching improvements to the code, however it needs to be thoroughly tested before we can let it into production environments.

I know all of you are currently in touch with our tech support and I'll make sure our engineers are getting on the same page with you as soon as they get an update from R&D.
Thank you for understanding!"

Some transparency would be great here.

Thanks
Post Reply