Mounting Stuck At 100% and slow loading

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Anatoly (staff), Max (staff)

Post Reply
McGuffin
Posts: 3
Joined: Thu Nov 05, 2015 7:53 pm

Thu Nov 05, 2015 8:28 pm

New to StarWind, but an old IT dog with lots of iSCSI experience.

I'm using build 8.0.8198 and have been experiencing lots of problems with my SAN. Coming from a straight Windows iSCSI SAN with PrimoCache and used to performance around 100-200 MBps. Setup as below:

SAN:
Windows 2012 R2
32 GB RAM
AMD A-8 5500 CPU
5 x WD 2TB Red Hard drives in RAID 5
1 x 64 GB SSD for L2 cache
2 x Intel Gigabit CT NIC with flow control and jumbo frames enabled

ESX hosts:
ESX 6.0U1
64 GB RAM each
2 x 8 Core XEON Procs each
HP NC375i Quad GB NIC each (for iSCSI)
MPIO enabled with Round Robin. Frame size on vSwitch and vmKernel set to 9000

iSCSI Switch:
Linksys LGS308 with two VLANs:
VLAN1 and VLAN2 are for iSCSI. One leg from the SAN and each host in each VLAN.
Jumbo Frames and Flow Control enabled

Symptoms:
When rebooting my SAN (after powering off all VMs and hosts), LSFS volumes take a long time to mount (2+ hours) and seem to be stuck at 100%:
Mounting stuck
Mounting stuck
Untitled.png (6.35 KiB) Viewed 7565 times
iSCSI performance seems to be very poor. Tests with HDTune show 15-20 Mbps for a single LSFS volume. For a vRAM drive on the SAN presented as a target, seeing 50-60Mbps. iPerf between VM and SAN with frame size set at 9000 shows 700-800 Mbps (~97MBps). Basically, it appears that the problem is with the SAN and the key difference is going from MS iSCSI + PrimoCache to StarWind. If anyone has input I would be grateful.
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Thu Nov 05, 2015 8:52 pm

Hi McGuffin,

Such a difference is very weird. Especially for LSFS devices. Have you tested it locally on SAN like just connecting starwinds iscsi targets over 127.0.0.1 with a couple of sessions and setting HDTune or IOMeter on it? Just to ensure it is not network :)

And yeah... LSFS device size, L1 cache size, L2 cache size... if any of course.
McGuffin
Posts: 3
Joined: Thu Nov 05, 2015 7:53 pm

Thu Nov 05, 2015 11:20 pm

Created a test target. 20 GB, no cache, flat file (not LSFS).

Then I connected to it via local iSCSI initiator (127.0.0.1).

ATTO test results to the RAID volume directly:
Direct to RAID Volume.png
Direct to RAID Volume.png (20 KiB) Viewed 7557 times
ATTO test results to the Target via loopback:
To Loopback iSCSI target.png
To Loopback iSCSI target.png (20.7 KiB) Viewed 7557 times
As you can see there is a serious discrepancy between the two. Additionally, the network should have nothing to do with the slow mount times...
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Tue Nov 10, 2015 11:35 am

Never faced such a strange behavior. Normally such kind of device should perform 5-10% slower than a local one.

Try to connect additional iSCSI sessions (like connecting 2-4 times over loopback). I noticed that sometimes an internal iSCSI limitation could lead to such performance discrepancy.
McGuffin
Posts: 3
Joined: Thu Nov 05, 2015 7:53 pm

Tue Nov 10, 2015 11:24 pm

I noticed there was a new build, so I completely upgraded my SAN to build 8.0.8716. There wasn't any change in the behavior of LSFS volumes on my system. I noticed that there seems to be a lot of disk I/O to the pagefile. My C:\ drive was a slower 5400 RPM drive, so I disk cloned it to a SSD and the I/O is no longer there, but it still takes a long time to mount a single 1TB LSFS volume.

Regarding iSCSI performance, I ran four threads against my two 1Gbps NICs.

Here are the commands I sent from the client (also two 1Gbps NICs):
C:\iPerf\iperf3.exe -c 192.168.200.10 -l 9000 -t 120
C:\iPerf\iperf3.exe -c 192.168.200.10 -p 5202 -l 9000 -t 120
C:\iPerf\iperf3.exe -c 192.168.210.10 -p 5203 -l 9000 -t 120
C:\iPerf\iperf3.exe -c 192.168.210.10 -p 5204 -l 9000 -t 120

Here is a view of taskman showing the throughput for the NICs:
iPerf-four threads.png
iPerf-four threads.png (29.7 KiB) Viewed 7524 times
I also have a support case in (I'm still in a trial license) but no one has contacted me beyond the standard notification email. I submitted the case on Friday.
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Mon Nov 16, 2015 6:06 pm

Hi McGuffin,
I've got an idea. A point here that I've noticed is that your local results could be so high compared to starwinds because of RAID controller internal caching. AFAIK starwind somehow avoids RAID controller caching and thus shows you such a low results. Try testing with a larger file size (like 5-10GBs) to penetrate RAID controller cache and to see if the difference still persists.
User avatar
Tarass (Staff)
Staff
Posts: 113
Joined: Mon Oct 06, 2014 10:40 am

Mon Nov 23, 2015 10:37 am

We usually recommend using IOMeter for performance testing, since it gives more reliable results on our experience.

Please find a generic benchmarking guide: https://www.starwindsoftware.com/starwi ... ice-manual
And recommended IOMeter set of tests: https://knowledgebase.starwindsoftware. ... -of-tests/
Senior Technical Support Engineer
StarWind Software Inc.
Post Reply