StarWind iSCSI SAN
StarWind Native SAN for Hyper-V
 

5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Public beta (bugs, reports, suggestions, features and requests)

Moderators: art (staff), anton (staff), Anatoly (staff), Max (staff)

5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby rchisholm » Tue Jul 26, 2011 6:12 pm

I upgraded our 5.7 Beta HA SAN to the released version of 5.7. Now, when one of our HP BL465c G7 servers with NC551i iSCSI HBA's with MPIO RR setup running W2K8 puts a heavy write load on a target, the target loses sync. Sometimes it is even causing all of the targets on the SAN to lose sync and the management console to disconnect. It stopped doing it for a few hours yesterday after I applied the newest license file that came with our service renewal, so I thought that very strangely it may have fixed it. It is doing it again though. I didn't see this happening in 5.7 Beta, but testing from the new SQL blade servers was extremely limited before this week, so we may have had the issue before and I just didn't know it. Has anyone else run into a similar problem? The new SQL servers are still in testing, but I'm going to need to fix this before we go into production.
rchisholm
 
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby Alex (staff) » Wed Jul 27, 2011 9:53 am

Thank you for message! We have found this issue a couple of days ago. The fix is ready, we are testing it now to update the release version.
Best regards,
Alexey.
User avatar
Alex (staff)
Staff
 
Posts: 177
Joined: Sat Jun 26, 2004 8:49 am

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby rchisholm » Thu Jul 28, 2011 2:13 pm

Any chance I'll have the fix before the weekend? I have a very busy schedule of testing next week and it would be quite helpful if I can upgrade the SANs over the weekend.
rchisholm
 
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby Alex (staff) » Thu Jul 28, 2011 2:37 pm

I am not sure. Testing can take several days, because this issue is not easy to reproduce.
Best regards,
Alexey.
User avatar
Alex (staff)
Staff
 
Posts: 177
Joined: Sat Jun 26, 2004 8:49 am

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby rchisholm » Fri Jul 29, 2011 3:39 pm

I turned on flow control on the switches between the SANs and the 10 GbE flex connects in the blade systems a few days ago. We havent had it drop sync since then. Is it possible that is the fix for us? I'm going to beat the heck out of it over the weekend and see if I can get it to fail again.
rchisholm
 
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby rchisholm » Mon Aug 01, 2011 1:05 pm

Flow control added to the switches seemed to fix the out of sync problems, but also seemed to cause pauses of a few seconds at a time and killed performance. I disabled flow control on the NIC's and the switches. Performance is back up to where it should be, and so far, I haven't knocked the iSCSI targets out of sync. I'm going to continue thrashing it to see what happens.
rchisholm
 
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby Alex (staff) » Mon Aug 01, 2011 1:18 pm

Thank you for update!

The issue that caused sync loss is related to the time of getting answer from HA partner. Looks like playing with flow control affects the response time and so affects the appearance of the issue.
Best regards,
Alexey.
User avatar
Alex (staff)
Staff
 
Posts: 177
Joined: Sat Jun 26, 2004 8:49 am

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby nbarsotti » Mon Aug 01, 2011 3:14 pm

So if there is a known bug with 5.7 when will an updated build of 5.7 be released? I don't want to decrease my stability when I upgrade from 5.6. How long should I wait?
nbarsotti
 
Posts: 38
Joined: Mon Nov 23, 2009 6:22 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby Alex (staff) » Mon Aug 01, 2011 3:21 pm

No later than Thursday, August 4th.
Best regards,
Alexey.
User avatar
Alex (staff)
Staff
 
Posts: 177
Joined: Sat Jun 26, 2004 8:49 am

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby nbarsotti » Mon Aug 01, 2011 3:33 pm

Great, I will plan my upgrade to 5.7 on Friday afternoon, PDT.
nbarsotti
 
Posts: 38
Joined: Mon Nov 23, 2009 6:22 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby rchisholm » Mon Aug 01, 2011 3:47 pm

I don't know about other people's setups, but so far I'm seeing a nice performance increase and it appears to be stable with flow control turned off across the board. Of course, this iSCSI network is all HP 10GbE, so your milage definitely may vary. I've been thrashing it from multiple servers with combinations of reads and writes, and so far it's working great.
rchisholm
 
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby kmax » Mon Aug 01, 2011 8:54 pm

With flow control on, did it affect max speed, or did you see more of a zig zag pattern? Meaning did it hit the max, then back down, then back up, etc. with it enabled?
kmax
 
Posts: 47
Joined: Thu Nov 04, 2010 3:37 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby rchisholm » Mon Aug 01, 2011 9:06 pm

Zig-zag pattern, with some couple second pauses with no traffic. The HP NC522SFP+ NIC's in the servers really don't seem to play nicely with flow control. Before I upgraded the firmware on them in our DL380G7's running VSphere 4.1, the one's used for iSCSI on VSphere would actually shut down from the pause packet flooding and I would have to power the server off and back on to fix it.
rchisholm
 
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby Bohdan (staff) » Mon Aug 08, 2011 9:32 am

StarWind 5.7 was updated. It should solve the problem. Please let us know about the results.
User avatar
Bohdan (staff)
Staff
 
Posts: 437
Joined: Wed May 23, 2007 12:58 pm

Re: 5.7 Beta Upgraded to 5.7 - Losing HA Sync Now

Postby rchisholm » Sun Aug 14, 2011 9:27 pm

Upgraded to the latest version this weekend. Upgrading the 1st node went perfectly. Everything fast synced in a matter of minutes. 2nd node hung for a while during the startup of the service and seems to have caused about half of the targets to lose sync and need a full sync. I'm seeing over 12 Gb/s on the sync though with 2 10GbE NICs teamed. :shock:

We'll bang on it really hard this week and see what happens. I like where the performance is going. Can't wait until the SSD cache integration. I should have 48 Intel X25-E 32GB drives available for it.
rchisholm
 
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Next

Return to Beta

Who is online

Users browsing this forum: No registered users and 1 guest