2 Node HA - NIC/Performance Guidence

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
bkdilse
Posts: 58
Joined: Mon Sep 10, 2018 6:14 am

Mon Oct 22, 2018 11:02 am

Hi,

I've got a 2 Node Storage only setup, using HDD RAID10 on each Node. Disk performance locally is great, getting around 470MB/s Read/Write (sequenctial).

I have have 2 x 1GB NICs connected on each host, using a cross-over cable. I have 4 x 1GB NICs for iSCSI traffic on each node.
I have 2 Hyper-V Hosts and 2 ESXi Hosts, each with 4 x 1GB NICs for iSCSI.

I know I have a non-reccomended setup, but every installation will be different, so I'm after ideas on what others are using. VM Performance is good, but not as good as I am used to (I've come from running SSDs for Storage).

I can't see any Sync issues, but not really looked into the logs, as things are working. Is there something I can monitor to see if they are struggling with the load?
Should I increase the amount of NICs for Sync?
Should I reduce the amount of NICs for iSCSI (both on Storage Nodes and Hypervisor nodes?

Any other advice would be appreciated, to try and imporve VM Performance. NOTE: At the moment, 10GB is not an option for me, as this is a Dev environment, I want to keep costs low.

Thanks in advance.
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Mon Oct 22, 2018 1:20 pm

Hi,

According to your description, the bottleneck of your configuration can be in 2 x 1GB NICs for synchronization. HA storage performance is limited with the speed of synchronization between nodes. Could you please clarify, how synchronization and iSCSI channels are connected, using a team or separately? Please keep in mind that we do not recommend NIC teaming for synchronization and iSCSI channels.
As far as using 10GB NICs is not an option for you, you can add at least 1 NIC as synchronization. You can check the current workload on the synchronization channel in the Task Manager or in the Resource Monitor.
bkdilse
Posts: 58
Joined: Mon Sep 10, 2018 6:14 am

Mon Oct 22, 2018 1:36 pm

Hi Oleg,

Thanks for your response.

For Sync: NIC1 on each node is connected via Cross-over cable. NIC2 on each node is connected via Cross-over cable. The 2 channels are on seperate subnets, no Teaming is present.
For iSCSI: 4 NICs on each node are connected via a Switch, on seperate subnets, no Teaming is present.

On the Hyper-V Hosts, I have a converged Network, with 4 x iSCSI VMNics.
On the ESXi Hosts, I have the 4 NICs on a vDS, with Failover, so only 1 NIC per Port Group.

I suspected the Sync channels may be the bottle neck, but wasn't sure how to confirm this. I'll check the Resource Monitor when I get a chance.
Does the above clear up things, and help you better judge where the issue is?
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Mon Oct 22, 2018 1:58 pm

Most probably the problem is related to synchronization channels between nodes. HA storage performance is limited with the speed of synchronization between nodes.
You can add 1 NIC as synchronization.
bkdilse
Posts: 58
Joined: Mon Sep 10, 2018 6:14 am

Mon Oct 22, 2018 2:06 pm

Thanks, I've got 2 spare NICs and crossover cables, so might just add the 2 extra NICs and see how it performs.

Also, I did change the Priority down to 25%, thinking it would improve client traffic. Would this have an impact?
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Mon Oct 22, 2018 2:29 pm

The synchronization priority is playing the role only during the process of full synchronization.
You can leave by default, 50%.
bkdilse
Posts: 58
Joined: Mon Sep 10, 2018 6:14 am

Mon Oct 22, 2018 3:37 pm

Oleg(staff) wrote:The synchronization priority is playing the role only during the process of full synchronization.
You can leave by default, 50%.
OK, thanks for clarification.

I'll update this post after adding the additional sync channels.
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Mon Oct 22, 2018 4:08 pm

Thank you!
bkdilse
Posts: 58
Joined: Mon Sep 10, 2018 6:14 am

Mon Oct 22, 2018 4:29 pm

Well, that's a disaster. As soon as I enabled the 2 extra NICs, the Startwind service on both Nodes hit 100% (I've seen this in the past).

I thought I'd then reboot each Node 1 at a time. I've initiated a stop of the service on Node 1, and it's failing to stop.

Can I email you the logs? I don't want to reboot the server, as this will cause an improper shutdown, resulting in a Full Sync.
bkdilse
Posts: 58
Joined: Mon Sep 10, 2018 6:14 am

Mon Oct 22, 2018 6:54 pm

After waiting an hour, I had no choice but to reboot the server, and now it's doing a Full Sync (as expected) :(

How can I avoid this in future, when the service just does not want to stop???
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Tue Oct 23, 2018 10:26 am

Could you please collect logs using this tool?
Please log a support case via this form.
bkdilse
Posts: 58
Joined: Mon Sep 10, 2018 6:14 am

Tue Oct 23, 2018 10:46 am

Oleg(staff) wrote:Could you please collect logs using this tool?
Please log a support case via this form.

Just logged a case, and collected logs. Waiting for response on where to send them.
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Tue Oct 23, 2018 3:09 pm

Thank you! We have sent you the details.
Post Reply