Node Disconnects and HA replication path lost

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Tue Jul 17, 2018 1:07 am

I have the following setup in a DEV/LAB environment.
Two - Dell R510 running Server 2016 with StarWind V8.0.12166
2x Xeon X5660
32GB RAM
Perc H700 with 12x 4TB SAS in RAID10
1x Intel 1.6T NVME
2x 1GbE ports for iSCSI Targets
2x 10GbE Mellanox ConnectX-3 for Sync

Last week I ran into an issue where the node I have named "SAN01" marked all of its sync channels as "offline" and then started showing the following in the event logs.
HA Device iqn.2008-08.com.starwindsoftware:###-san01-###-##-#####: partner node iqn.2008-08.com.starwindsoftware:###-san02-###-##-##### state has changed to "Not synchronized".
I tried to run the PS script for performing a synchronization on all disks but that did not work. As I'm still in the Trial period I fired up the GUI, only to find that on the SAN01 node the GUI would not connect.
As a last ditch effort I rebooted 01 but it hung for going on 12 hours. At that point another engineer rebooted the system and we had disk corruption, which I'm not blaming anyone but us for. (This is storage and it's not easy)
However today I'm encountering the same issues on one of my disks. I have collected my system logs and was wondering if someone could take a look? Maybe let me know what I'm doing wrong?
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Fri Jul 20, 2018 2:09 pm

What StarWind build are you using?
Can you please collect the logs from your servers and share with us for better understanding the problem you faced?
You can collect log using this tool.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Wed Jul 25, 2018 1:12 am

This is the latest build. StarWind V8.0.12166

I've got logs, but i'll collect again because the system dumped it self when I attempted to format a volume with an oVirt node this evening. Not sure what happened there.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Wed Jul 25, 2018 10:11 am

For proper operation with Linux, refer to https://knowledgebase.starwindsoftware. ... initiator/ and introduce the change suggested there.
Last edited by Boris (staff) on Wed Jul 25, 2018 9:41 pm, edited 1 time in total.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Wed Jul 25, 2018 8:47 pm

Thanks for the heads up. Made those changes and will be sending the logs in from both of my nodes shortly.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Jul 27, 2018 6:47 pm

Sure, waiting for any updates from you.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Fri Jul 27, 2018 7:04 pm

Have been using oVirt now for a few hours and haven't had any issues. It's still a pain in the rear to get MPIO on Linux to work, but that's Linux not StarWind. :)

I sent the logs in, but haven't heard anything back, I may have done it wrong. Should I have used this page https://www.starwindsoftware.com/support-form ?
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Fri Jul 27, 2018 7:54 pm

Yes, please use this form. Please refer to this forum thread.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Fri Jul 27, 2018 8:12 pm

Sounds good, uploading the logs again now.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jul 30, 2018 9:15 pm

Unfortunately, by now we have not received the logs from you. Could you let us know whether you have managed to do so?
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Mon Jul 30, 2018 9:44 pm

Odd... I've uploaded them twice. Let me try again. I think I used a browser with an add-on that is blocking the upload.
Trying again.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Mon Jul 30, 2018 9:57 pm

I just completed the upload. Can you confirm that you got it?
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jul 30, 2018 10:12 pm

Got it where? Do you submit a ticket when uploading the logs?
I do not see any new ticket arriving. If attaching logs to the ticket at creation fails for you, simply create a ticket using that form, refer to this thread and we will provide you with further instructions regarding uploading the logs.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Tue Jul 31, 2018 2:32 pm

The files must have been too large for upload. Got the case created and uploaded the files to you. Will report back if/when we figure out what is going wrong.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Tue Jul 31, 2018 5:04 pm

Logs submitted by xpystchrisx showed interruption in NICs operation, and that should be the first step in troubleshooting the issue.

xpystchrisx,
Check whether the issue keeps getting reproduced on the new NICs you install into your servers and report any results.
Post Reply