Page 1 of 1

Sync connection with partner node lost

Posted: Tue Sep 27, 2016 11:34 pm
by vhost310
Thanks in advance for any assistance. I've recently put together a 2-node hyperconverged setup using VMware ESXi6 and Window Server 2012R2 DC. Hardware and config specs below. I'm getting occasional errors in the Starwind VMs stating the connection to the partner node on the sync channel is lost (see below). About 2 seconds later it's reestablished and all is well. This seems to happen at times of heavier load, though all that's running is an Exchange Server VM with 6 users. Everything seems to be running good and fast so far except for the occasional disconnects. Please let me know any other configuration specs I can provide. Thank you

Errors
High Availability Device iqn….swd01, all synchronization Connections with Partner Node iqn.….swd01 lost

High Availability Device "iqn.…swd01", critical device response time has been detected. (IO operation delay is more than 10 sec). To avoid performance issues on the whole device, automatic synchronization attempts will be performed every 30 minute(s).

High Availability Device iqn.….swd01, current Node passed to "Not synchronized" State, Reason is Synchronization Partner iqn.….swd01 Channel has been disconnected due to Timeout on local Storage Device

High Availability Device iqn….swd01, current Node State has changed to "Not synchronized"

then 2 seconds later:
High Availability Device iqn….swd01, synchronization Connection IP 10.10.30.11 with Partner Node iqn.….swd01 established

Hardware
(2) Supermicro SuperServer 2028R-C1RT4+
(1) Xeon E5-2620 v4
(2) Samsung 32GB 288-pin DDR4 2400 ECC RAM
(1) ESXi BOOT - SanDisk x400 128GB SSD SD8SB8U
(4) RAID-10 - Samsung SM863 480GB SSD
(1) LSI 3108 Onboard RAID Controller
(4) 10G Intel NICs
VMware vSphere 6 Enterprise Plus

Network
(1) Direct 10g connection for iSCSI
(1) Direct 10g connection for SYNC
(1) connect to 1g switch for WAN, Heartbeat
(1) connect to 1g switch for NAS/backup traffic
Cables - Tripplite Cat6a Shielded, 3ft
Jumbo frames enabled on direct iSCSI and SYNC connections inside SW VMs and in VMware

StarWind VMs
4 vCPU, 8GB RAM, Paravirtual SCSI, VMXNET3 NICs, Windows Server 2012 R2 Datacenter
VMWARE Disk - 750GB Thick Provision, Eager Zeroed
NETBIOS and DNS registration disabled on iSCSI and SYNC NICs
tweaks applied:
netsh int tcp set supplemental template=datacenter
netsh int tcp set global rss=enabled
netsh int tcp set global chimney=enabled
netsh int tcp set global dca=enabled

Storage / iSCSI
(4) SSD RAID-10, 64k stripe
iSCSI Round Robin, Disk.DiskMaxIOSize set to 512, Robin Disk IOPS size limit to 1
(1) StarWind HA disk, 740 GB, Thick Provision, Write-Through Cache 4GB

Re: Sync connection with partner node lost

Posted: Mon Oct 03, 2016 9:51 pm
by vhost310
Any advice on what might cause the sync channel to drop for a couple seconds? I'm not sure how to proceed diagnosing this and I'm not confident moving forward into production until I do. Thank you

Re: Sync connection with partner node lost

Posted: Tue Oct 04, 2016 8:25 am
by Michael (staff)
Hello vhost310,
Could you please double check that you can ping one StarWind VM from another via iSCSI and SYNC channel with Jumbo frames?
Basically, it should be like: ping -f -l 8000 X.X.X.X , where X.X.X.X is IP address of Partner VM.

Re: Sync connection with partner node lost

Posted: Wed Oct 05, 2016 3:05 am
by vhost310
Thank you for your help. Yes, I'm able to ping using Jumbo Frames on both channels without issue.

Re: Sync connection with partner node lost

Posted: Wed Oct 05, 2016 8:10 am
by Michael (staff)
Could you please open a support case by filling the support form: https://www.starwindsoftware.com/support-form
Also, could you please collect the logs as I have asked you in PM?
Thank you!

Re: Sync connection with partner node lost

Posted: Wed Oct 05, 2016 7:06 pm
by vhost310
Thank you, I'm opening a support case and I've replied to the PM.

Re: Sync connection with partner node lost

Posted: Thu Oct 06, 2016 1:34 pm
by Al (staff)
Hello Vhost310,

Thank you. We have received your logs.

We will update community as soon as we wil have results.