Sync connection with partner node lost

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
vhost310
Posts: 5
Joined: Tue Sep 27, 2016 10:29 pm

Tue Sep 27, 2016 11:34 pm

Thanks in advance for any assistance. I've recently put together a 2-node hyperconverged setup using VMware ESXi6 and Window Server 2012R2 DC. Hardware and config specs below. I'm getting occasional errors in the Starwind VMs stating the connection to the partner node on the sync channel is lost (see below). About 2 seconds later it's reestablished and all is well. This seems to happen at times of heavier load, though all that's running is an Exchange Server VM with 6 users. Everything seems to be running good and fast so far except for the occasional disconnects. Please let me know any other configuration specs I can provide. Thank you

Errors
High Availability Device iqn….swd01, all synchronization Connections with Partner Node iqn.….swd01 lost

High Availability Device "iqn.…swd01", critical device response time has been detected. (IO operation delay is more than 10 sec). To avoid performance issues on the whole device, automatic synchronization attempts will be performed every 30 minute(s).

High Availability Device iqn.….swd01, current Node passed to "Not synchronized" State, Reason is Synchronization Partner iqn.….swd01 Channel has been disconnected due to Timeout on local Storage Device

High Availability Device iqn….swd01, current Node State has changed to "Not synchronized"

then 2 seconds later:
High Availability Device iqn….swd01, synchronization Connection IP 10.10.30.11 with Partner Node iqn.….swd01 established

Hardware
(2) Supermicro SuperServer 2028R-C1RT4+
(1) Xeon E5-2620 v4
(2) Samsung 32GB 288-pin DDR4 2400 ECC RAM
(1) ESXi BOOT - SanDisk x400 128GB SSD SD8SB8U
(4) RAID-10 - Samsung SM863 480GB SSD
(1) LSI 3108 Onboard RAID Controller
(4) 10G Intel NICs
VMware vSphere 6 Enterprise Plus

Network
(1) Direct 10g connection for iSCSI
(1) Direct 10g connection for SYNC
(1) connect to 1g switch for WAN, Heartbeat
(1) connect to 1g switch for NAS/backup traffic
Cables - Tripplite Cat6a Shielded, 3ft
Jumbo frames enabled on direct iSCSI and SYNC connections inside SW VMs and in VMware

StarWind VMs
4 vCPU, 8GB RAM, Paravirtual SCSI, VMXNET3 NICs, Windows Server 2012 R2 Datacenter
VMWARE Disk - 750GB Thick Provision, Eager Zeroed
NETBIOS and DNS registration disabled on iSCSI and SYNC NICs
tweaks applied:
netsh int tcp set supplemental template=datacenter
netsh int tcp set global rss=enabled
netsh int tcp set global chimney=enabled
netsh int tcp set global dca=enabled

Storage / iSCSI
(4) SSD RAID-10, 64k stripe
iSCSI Round Robin, Disk.DiskMaxIOSize set to 512, Robin Disk IOPS size limit to 1
(1) StarWind HA disk, 740 GB, Thick Provision, Write-Through Cache 4GB
vhost310
Posts: 5
Joined: Tue Sep 27, 2016 10:29 pm

Mon Oct 03, 2016 9:51 pm

Any advice on what might cause the sync channel to drop for a couple seconds? I'm not sure how to proceed diagnosing this and I'm not confident moving forward into production until I do. Thank you
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Tue Oct 04, 2016 8:25 am

Hello vhost310,
Could you please double check that you can ping one StarWind VM from another via iSCSI and SYNC channel with Jumbo frames?
Basically, it should be like: ping -f -l 8000 X.X.X.X , where X.X.X.X is IP address of Partner VM.
vhost310
Posts: 5
Joined: Tue Sep 27, 2016 10:29 pm

Wed Oct 05, 2016 3:05 am

Thank you for your help. Yes, I'm able to ping using Jumbo Frames on both channels without issue.
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Wed Oct 05, 2016 8:10 am

Could you please open a support case by filling the support form: https://www.starwindsoftware.com/support-form
Also, could you please collect the logs as I have asked you in PM?
Thank you!
vhost310
Posts: 5
Joined: Tue Sep 27, 2016 10:29 pm

Wed Oct 05, 2016 7:06 pm

Thank you, I'm opening a support case and I've replied to the PM.
Al (staff)
Staff
Posts: 43
Joined: Tue Jul 26, 2016 2:26 pm

Thu Oct 06, 2016 1:34 pm

Hello Vhost310,

Thank you. We have received your logs.

We will update community as soon as we wil have results.
Post Reply