Page 1 of 2

Node Disconnects and HA replication path lost

Posted: Tue Jul 17, 2018 1:07 am
by xpystchrisx
I have the following setup in a DEV/LAB environment.
Two - Dell R510 running Server 2016 with StarWind V8.0.12166
2x Xeon X5660
32GB RAM
Perc H700 with 12x 4TB SAS in RAID10
1x Intel 1.6T NVME
2x 1GbE ports for iSCSI Targets
2x 10GbE Mellanox ConnectX-3 for Sync

Last week I ran into an issue where the node I have named "SAN01" marked all of its sync channels as "offline" and then started showing the following in the event logs.
HA Device iqn.2008-08.com.starwindsoftware:###-san01-###-##-#####: partner node iqn.2008-08.com.starwindsoftware:###-san02-###-##-##### state has changed to "Not synchronized".
I tried to run the PS script for performing a synchronization on all disks but that did not work. As I'm still in the Trial period I fired up the GUI, only to find that on the SAN01 node the GUI would not connect.
As a last ditch effort I rebooted 01 but it hung for going on 12 hours. At that point another engineer rebooted the system and we had disk corruption, which I'm not blaming anyone but us for. (This is storage and it's not easy)
However today I'm encountering the same issues on one of my disks. I have collected my system logs and was wondering if someone could take a look? Maybe let me know what I'm doing wrong?

Re: Node Disconnects and HA replication path lost

Posted: Fri Jul 20, 2018 2:09 pm
by Oleg(staff)
What StarWind build are you using?
Can you please collect the logs from your servers and share with us for better understanding the problem you faced?
You can collect log using this tool.

Re: Node Disconnects and HA replication path lost

Posted: Wed Jul 25, 2018 1:12 am
by xpystchrisx
This is the latest build. StarWind V8.0.12166

I've got logs, but i'll collect again because the system dumped it self when I attempted to format a volume with an oVirt node this evening. Not sure what happened there.

Re: Node Disconnects and HA replication path lost

Posted: Wed Jul 25, 2018 10:11 am
by Boris (staff)
For proper operation with Linux, refer to https://knowledgebase.starwindsoftware. ... initiator/ and introduce the change suggested there.

Re: Node Disconnects and HA replication path lost

Posted: Wed Jul 25, 2018 8:47 pm
by xpystchrisx
Thanks for the heads up. Made those changes and will be sending the logs in from both of my nodes shortly.

Re: Node Disconnects and HA replication path lost

Posted: Fri Jul 27, 2018 6:47 pm
by Boris (staff)
Sure, waiting for any updates from you.

Re: Node Disconnects and HA replication path lost

Posted: Fri Jul 27, 2018 7:04 pm
by xpystchrisx
Have been using oVirt now for a few hours and haven't had any issues. It's still a pain in the rear to get MPIO on Linux to work, but that's Linux not StarWind. :)

I sent the logs in, but haven't heard anything back, I may have done it wrong. Should I have used this page https://www.starwindsoftware.com/support-form ?

Re: Node Disconnects and HA replication path lost

Posted: Fri Jul 27, 2018 7:54 pm
by Oleg(staff)
Yes, please use this form. Please refer to this forum thread.

Re: Node Disconnects and HA replication path lost

Posted: Fri Jul 27, 2018 8:12 pm
by xpystchrisx
Sounds good, uploading the logs again now.

Re: Node Disconnects and HA replication path lost

Posted: Mon Jul 30, 2018 9:15 pm
by Boris (staff)
Unfortunately, by now we have not received the logs from you. Could you let us know whether you have managed to do so?

Re: Node Disconnects and HA replication path lost

Posted: Mon Jul 30, 2018 9:44 pm
by xpystchrisx
Odd... I've uploaded them twice. Let me try again. I think I used a browser with an add-on that is blocking the upload.
Trying again.

Re: Node Disconnects and HA replication path lost

Posted: Mon Jul 30, 2018 9:57 pm
by xpystchrisx
I just completed the upload. Can you confirm that you got it?

Re: Node Disconnects and HA replication path lost

Posted: Mon Jul 30, 2018 10:12 pm
by Boris (staff)
Got it where? Do you submit a ticket when uploading the logs?
I do not see any new ticket arriving. If attaching logs to the ticket at creation fails for you, simply create a ticket using that form, refer to this thread and we will provide you with further instructions regarding uploading the logs.

Re: Node Disconnects and HA replication path lost

Posted: Tue Jul 31, 2018 2:32 pm
by xpystchrisx
The files must have been too large for upload. Got the case created and uploaded the files to you. Will report back if/when we figure out what is going wrong.

Re: Node Disconnects and HA replication path lost

Posted: Tue Jul 31, 2018 5:04 pm
by Boris (staff)
Logs submitted by xpystchrisx showed interruption in NICs operation, and that should be the first step in troubleshooting the issue.

xpystchrisx,
Check whether the issue keeps getting reproduced on the new NICs you install into your servers and report any results.