Problems with Primary Target shortly after upgrade to 5.6

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

User avatar
mooseracing
Posts: 91
Joined: Mon Oct 11, 2010 11:55 am

Fri Feb 25, 2011 12:06 am

As the title suggests I am having some issues about a week into the upgrade of 5.6. All of a sudden when my primary is online none of my VM's will respond that are on the iSCSI. As soon as I shutdown the primary and the partner takes over everything works great. My partner can't run everything very well currently, I'm waiting for parts to get here to build a decent one.

I thought maybe there might have been some corruption so the last time I brought the primary up I did a full sync after the fast, but that didn't help any.

I am using this in a Srv 2008 Cluster with VM's on the iSCSI w/HA. Both of my cluster servers show connected to the iSCSI target when this problem occurs.

Idears?
@ziz (staff)
Posts: 57
Joined: Wed Aug 18, 2010 3:44 pm

Fri Feb 25, 2011 11:21 am

mooseracing wrote:As the title suggests I am having some issues about a week into the upgrade of 5.6. All of a sudden when my primary is online none of my VM's will respond that are on the iSCSI. As soon as I shutdown the primary and the partner takes over everything works great. My partner can't run everything very well currently, I'm waiting for parts to get here to build a decent one.

I thought maybe there might have been some corruption so the last time I brought the primary up I did a full sync after the fast, but that didn't help any.

I am using this in a Srv 2008 Cluster with VM's on the iSCSI w/HA. Both of my cluster servers show connected to the iSCSI target when this problem occurs.

Idears?
Check the status of the current and partner servers in the StarWind management console, you should have:
-Synchronization status: Synchronized
-Ready: Yes
Also you should check your network link to the primary server.
Check that in MS iSCSI initiator on all servers in the cluster you have both primary and partner servers connected.
In case if the HA image on the primary server is corrupted you can recreate a new HA using the HA image from the second partner in one side and a new HA image in the second side, and in the final step choose the proper synchronization direction to synchronize the new image with the existing one.
Aziz Keissi
Technical Engineer
StarWind Software
User avatar
mooseracing
Posts: 91
Joined: Mon Oct 11, 2010 11:55 am

Fri Feb 25, 2011 1:41 pm

Right now I am finishing moving the VM's off the iSCSI boxes so I can recreate the HA targets.

Everything was synchronized and in good status, my cluster boxes were connected to both targets (showed in windows and in starwind).
kmax
Posts: 47
Joined: Thu Nov 04, 2010 3:37 pm

Fri Feb 25, 2011 5:27 pm

I feel better I'm not the only one. I ran into this exact situation as well on my 3 node failover cluster with 5 HA targets.

Tried this twice. First time was before SP1 for R2. Upgraded one node, resynced, then did the second and the sync finished. During all of this the nodes started having problems and threw iSCSI events in their event log. Rebooting nodes didn't work. Rolled back to 5.5 and everything was fine.

Tried again after upgrading all failover nodes to SP1. This time I installed 5.6 on only one of the nodes and waited to see if it had issues. It resynced fine, but the cluster had the exact same problems. Couldn't access the shared volumes and everything became unusable. Rolling back to 5.5 on that 1 server solved the issue.

It is hard for me to test much since the VM's are used nearly 24 hours a day and downtime is not a good situation.

I will also add the following weirdness with all of this: The second time I did this, one node was still working and the VM's and the CSV it had control over was fine. Also, while refreshing the display in the 5.6 management console I would get an occasional app crash. I never had that happen in 5.5.

NICs and drivers are identical in all three nodes.
User avatar
mooseracing
Posts: 91
Joined: Mon Oct 11, 2010 11:55 am

Fri Feb 25, 2011 6:47 pm

I didn't get any event log errors, that was the worst part about this was tracking down where the problem was. Everything looked fine. I did have the mgmt console crash like you said when I tried a refresh on and off.

I tried doing new HA targets on 5.6 didn't help. Rolled back to 5.5 and it's working normal again, I moved one of our DC's back over to at least put some load to keep an eye on it.
Andrew (staff)
Staff
Posts: 5
Joined: Tue Apr 13, 2010 8:32 am

Fri Feb 25, 2011 7:14 pm

Hello,

Possible we already know answer, it's can be bug in event logger, and we already have fix for it.
Do you have *.mdump files in folder ..\Program Files\StarWind Software\StarWind\ ?
If yes then please send it to swsdev@starwindsoftware.com or upload to mediafire.com.
User avatar
mooseracing
Posts: 91
Joined: Mon Oct 11, 2010 11:55 am

Fri Feb 25, 2011 8:30 pm

Nope, don't see any. Would they have stayed in there when I rolled back to 5.5?
kmax
Posts: 47
Joined: Thu Nov 04, 2010 3:37 pm

Fri Feb 25, 2011 8:59 pm

I don't see any either.
Andrew (staff)
Staff
Posts: 5
Joined: Tue Apr 13, 2010 8:32 am

Sat Feb 26, 2011 1:41 pm

Yes, *.mdump files will stay after roll back. So it's not event log bug.
Guys, please archive and send your log files to as.
And inform as what day and approximately time the problem occurred?
camealy
Posts: 77
Joined: Fri Sep 10, 2010 5:54 am

Sun Feb 27, 2011 9:20 pm

Is the roll back to 5.5 as simple as stopping the service on one side, installing 5.5 and re-synching? Will 5.5 install on top of 5.6 or do you have to uninstall first?

Thanks!
@ziz (staff)
Posts: 57
Joined: Wed Aug 18, 2010 3:44 pm

Mon Feb 28, 2011 8:35 am

camealy wrote:Is the roll back to 5.5 as simple as stopping the service on one side, installing 5.5 and re-synching? Will 5.5 install on top of 5.6 or do you have to uninstall first?

Thanks!
It can be installed on top.
Install on one side, sync, then install on second side and sync.
Aziz Keissi
Technical Engineer
StarWind Software
User avatar
mooseracing
Posts: 91
Joined: Mon Oct 11, 2010 11:55 am

Mon Feb 28, 2011 12:30 pm

@ziz (staff) wrote:
camealy wrote:Is the roll back to 5.5 as simple as stopping the service on one side, installing 5.5 and re-synching? Will 5.5 install on top of 5.6 or do you have to uninstall first?

Thanks!
It can be installed on top.
Install on one side, sync, then install on second side and sync.
Yep, this is what I did. It did Fast syncs and not Fulls if you are wondering that as well.
@ziz (staff)
Posts: 57
Joined: Wed Aug 18, 2010 3:44 pm

Mon Feb 28, 2011 12:48 pm

mooseracing wrote:
@ziz (staff) wrote:
camealy wrote:Is the roll back to 5.5 as simple as stopping the service on one side, installing 5.5 and re-synching? Will 5.5 install on top of 5.6 or do you have to uninstall first?

Thanks!
It can be installed on top.
Install on one side, sync, then install on second side and sync.
Yep, this is what I did. It did Fast syncs and not Fulls if you are wondering that as well.
The service is able to determine if it needs fast or full sync, based on the amount of data difference between the 2 HA partners. In both cases it will work well.
Aziz Keissi
Technical Engineer
StarWind Software
Andrew (staff)
Staff
Posts: 5
Joined: Tue Apr 13, 2010 8:32 am

Tue Mar 01, 2011 5:07 pm

Hello guys,

Do you reset your ms cluster nodes before problem occurred, or do something with ms cluster nodes before problem occurred?
kmax
Posts: 47
Joined: Thu Nov 04, 2010 3:37 pm

Tue Mar 01, 2011 7:06 pm

No, just basically upgraded to 5.6. After the resync occurs on the first node is when it has problems.

I have however perhaps spotted something. Mooseracing...what nics you using and are you using any teaming?
Post Reply