Starwind for vSphere - Underlying device response time

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
frankyyy02
Posts: 10
Joined: Wed Aug 14, 2019 9:14 pm

Sat Nov 16, 2019 11:09 pm

Hi

Have been testing Starwind for vSphere for a while on a small cluster. After upgrading to the most recent version i noticed a small number of alerts relating to write delays and similar such as:

Device iqn.2008-08.com.starwindsoftware:172.16.250.51-storage-1400gb: command "WRITE". Partner "iqn.2008-08.com.starwindsoftware:172.16.250.52-storage-1400gb" request response time is longer than expected. Response time is 12 sec.

These appeared to be random, a couple of times a week and it sorts itself within a second or two. vSphere logs show drops to the iSCSI datastore (both paths, local and the 2nd host). The configuration is as follows:

* 2 hosts (Supermicro E200-8D)
* Each host has 64GB RAM
* Each host has an INTEL D1 P4101 2TB NVME PCIE3X4 M.2 22X80MM for the SW data store with a 1.5TB SW store created
* 1 x 10GB NIC dedicated to Sync via direct connection between each
* 1 x 10GB NIC dedicated to iSCSI and vMotion via direct connection between each
* 2 x 1GB NICs for operational ESXi traffic

The SW data store is configured with a 1.4TB size, write back with 2GB cache.

The last two days we have been running some import processes in Elasticsearch and noted the instances of these errors increased. It appeared to be random, sometimes even when nothing was actually running but also as more traffic was being generated.

Code: Select all

2019-11-17 9:41:00: HA Device iqn.2008-08.com.starwindsoftware:172.16.250.52-storage-1400gb: command "WRITE". Underlying device response time is longer than expected. Response time is 12 sec.
2019-11-17 9:41:00: HA Device iqn.2008-08.com.starwindsoftware:172.16.250.51-storage-1400gb: command "WRITE". Partner "iqn.2008-08.com.starwindsoftware:172.16.250.52-storage-1400gb" request response time is longer than expected. Response time is 15 sec.
2019-11-17 9:41:00: HA Device iqn.2008-08.com.starwindsoftware:172.16.250.51-storage-1400gb: command "0x89". Request execution time is longer than expected. Response time is 17 sec.
Note, these are just from email logs, i have 100s of these now, almost all in the last hour.

ESXi logs also reflect the iSCSI datastore drop and whilst previously, it almost instantly reconnected and got back in sync. The most recent instance, it didn't resync, but after about 25 minutes, a FAST Resync was automatically performed and its back in sync.
I should also note that i was running with jumbo frames and recently switched back to 1500 on the Sync and iSCSI 10GB links to see if it made any difference, but doesn't appear to.

I am currently on a trial licence, but hoping support can assist to shed some light on the response time issues. The number of VMs are small overall with only 2 small windows VMs (domain a controller and similar), vCenter and the vSAN VMs so not being taxed at all. The Elasticsearch imports would certainly be generating more traffic now, but with all flash i wouldn't have thought it would be raising so many delayed write errors.

Thanks
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Wed Nov 20, 2019 10:43 am

How are NVMe drives utilized by the StarWind VMs? Are they configured as pass-through or have you created VMDKs on top of those disks?
Post Reply