Node Disconnects and HA replication path lost

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Tue Jul 31, 2018 5:23 pm

We are working to source some replacement Cables and nice for testing. Will report back when we have updates. Thanks!
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Tue Jul 31, 2018 7:14 pm

Nice. Keep the thread updated.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Sun Aug 05, 2018 5:08 pm

X520-DA2's installed today. Just going through the initial resync and then we'll turn up all of the services. Thus far we've configured the Intel NICs at the Low Latency setting with Jumbo's enabled and left the rest alone. Using the latest drivers from Intel's site.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Sun Aug 05, 2018 8:00 pm

Well in an interesting twist the sync between nodes seems to be a bit more stable and quick... however one of my drives is now refusing to sync with a "mount status: Failed" listed. Trying to bring it back but it is being stubborn.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Mon Aug 06, 2018 1:22 pm

Just wanted to report back. This morning I removed and added the replica node back to the system. Where a re-sync on a drive with data would have taken approximately 15 minutes no matter what the drive size, this took approximately 30 seconds. I believe we may have either had bad Mellanox cards or drivers. Either way the system is using Intel now for both iSCSI targets and HA. Things are going much better this time around.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Aug 06, 2018 2:52 pm

I believe you just had some bad Mellanox card(s). From our experience, namely Mellanox NICs are causing the least amount of troubles. Anyway, it is great you can enjoy StarWind VSAN now.
xpystchrisx
Posts: 26
Joined: Tue Jun 05, 2018 6:20 pm

Thu Aug 16, 2018 2:04 pm

I've just completed my second set of patching sessions since we replaced the bad mellanox cards. The system is back up and 100% synchronized with no external intervention within about 30 minutes of the reboot. So I'm going to say that it was deff the Mellanox cards or the DAC cables, that were giving us the issues.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Aug 16, 2018 2:21 pm

It's great we figured it out and you can finally enjoy our software.
Post Reply