ESXi iSCSI initiator WRITE speed

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

CyberNBD
Posts: 25
Joined: Fri Mar 25, 2011 10:56 pm

Mon Mar 28, 2011 8:39 pm

As stated here http://www.starwindsoftware.com/forums/ ... tml#p12990, I'm in the process of testing some scenario's for a new lab environment.

To test the maximum performance of the iSCSI setup I started quite simple:
One Dell Poweredge 2950 server (2x dual core Xeon 3.0, 4 gig ram, 2x 73G SAS 15K local, iSCSi offload) running W2k8 R2 + starwind 5.6.
One Dell MD1000 disk enclosure connected to above using dual 4GB SAS lines to dell PERC-5 128MB cache w. battery backup / write back enabled. For testing purposes I created 2 arrays:
- 6-disk RAID50 SAS 15K Array
- 4-disk RAID5 SATA 7.2K Array
One Dell Poweredge 2950 server (2x dual core Xeon 3.0, 12 gig ram, 2x 146G SAS 15K local, iSCSI offload) running VMWare ESXi 4.1 Update 1

Both servers are connected straight through two gigabit ethernet lines (iSCSI A and B) using the onboard broadcom gigabit NIC's, this to eliminate switch misconfiguration problems etc. Additional Intel Pro 1000 dual port NIC's are used for management LAN. Jumbo frames enabled on Starwind side and VMWare side (vswitch and kernel interface).
At the StarWind side I created 4 targets: Quorum (512 MB SAS), Cluster Data 1 (50GB SAS), Cluster Data 2 (1TB SATA), ESXi Data (75GB SAS), all targets are configured cache mode normal (no caching). I also made the TCP\IP setting changes as recommended.
All SAS traffic is going through iSCSI A and all SATA traffic is going through iSCSI B. At the moment I'm only testing SAS.

At the VMWare side I created 4 virtual machines W2k8 R2 (DC, 2 Fileservers running cluster mode and one stand alone domain-member). All OS-es are running off local disks.
The fileservers are connected through the MD1000 using MS iSCSI initiator, the stand alone server has a second 50GB disk connected to the MD1000 through ESXi iSCSI initator.

The big issue I'm having is a significant performance difference between the W2k8 iSCSI initiator and the ESXi initiator.

The virtual machines using W2k8 initiator seem to perform nice and stable:


The virtual machines using the esxi initiator.. that's a different story. Using direct io no problems (except the 128K is a bit strange), using cache the write speed is horrible:


To be sure, I also tested this on a Win7 VM, and at the moment I am installing W2k8 on a VM using the iSCSI storage as OS-Disk. The install runs for 25 minutes now and it isn't even half way. Using local storage (and Openfiler iSCSI Target) the complete install of the enterprise version finishes within 15 minutes.

Any clue what this could be? The iSCSI path for both cases is exactly the same. Only difference is that the 2k8 initiator connects through a vmware network to the vSwitch while the ESXi initiator connects through a vmkernal network to the vSwitch.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Mar 29, 2011 9:45 am

Did you read this thread:

http://www.starwindsoftware.com/forums/ ... t2386.html

And followed suggestions from here:

http://sustainablesharepoint.wordpress. ... ith-iscsi/

It references Hyper-V but TCP stack tweaks INSIDE guest Windows virtual machines are the same.

?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
@ziz (staff)
Posts: 57
Joined: Wed Aug 18, 2010 3:44 pm

Tue Mar 29, 2011 9:49 am

Hi,
What is the stripe size used for your RAIDs?
Did you applied the recommended setting for ESX iSCSI initiator? Here is the link http://www.starwindsoftware.com/forums/ ... t2296.html
Actually, the issue seems to be related to non-similar block size on the different levels of your SAN, I mean you may use a stripe size on your RAIDs, another block size for your VMFS, windows will use a third block size and cache is using 64k as block size, and as a results you get low write performance.
Solution is to use a similar block size on all your disks. And from our side we are now working on a new build which will allow to choose the block size for the cache. We will release it soon.
Aziz Keissi
Technical Engineer
StarWind Software
CyberNBD
Posts: 25
Joined: Fri Mar 25, 2011 10:56 pm

Tue Mar 29, 2011 9:46 pm

I indeed found this thread after my first tests and added TcpAckFrequency to the windows registry.
This greatly improves MS iSCSI initiator behavior on both read and write speeds on the transfer sizes upto 32K and slightly improves the read speeds above 32K (DirectIO).
The tests above were made after changing the TcpAckFrequency.

Since this is only possible for iSCSI network connections within Windows this isn't a solution for disks using the ESXi Initiator.
anton (staff) wrote:Did you read this thread:

http://www.starwindsoftware.com/forums/ ... t2386.html

And followed suggestions from here:

http://sustainablesharepoint.wordpress. ... ith-iscsi/

It references Hyper-V but TCP stack tweaks INSIDE guest Windows virtual machines are the same.

?
CyberNBD
Posts: 25
Joined: Fri Mar 25, 2011 10:56 pm

Tue Mar 29, 2011 10:10 pm

The stripe size for the RAID array is 64K. Read Ahead disabled, Write Back enabled. Disk cache disabled for the SAS array, disk cache enabled for the SATA array (I tested this before and it seems that enabling disk cache on the SAS array negatively influences the performance).

When using a single link for an iSCSI target the only esx-iscsi-initiator setting that applies to the setup would be the disk timeout value?

Isn't it strange the Windows Initiator disks, who use exactly the same settings have no problems concerning the write speed? Both use the same array and thus stripe size/cache policy. Both have same target settings within StarWind. Both use the same network path upto the vSwitch.
This means the problem would then be situated somewhere within the VMFS system or related settings (which is configured for 1MB block size)?

Changing the settings within the VMWare Virtual Machine O/S wouldn't be the solution since this cant be done until after installation. Since I'm going to use VMFS disks on the SAN as O/S disks on the ESXi LAB server slow write speeds during installation aren't an option.

Your remark about the cache block size made me thinking. Currently I have disabled all caching on the StarWind targets. I tested the different settings using the MS iSCSI initiator and found no significant changes in read/write speed so i also disabled the cache on the ESXi Target, thus avoiding problems in case of hardware or UPS power failures.
I will test again using write-back cache and also check for windows block size (4k default at the moment on both MS and ESXi target disks I think).
@ziz (staff) wrote:Hi,
What is the stripe size used for your RAIDs?
Did you applied the recommended setting for ESX iSCSI initiator? Here is the link http://www.starwindsoftware.com/forums/ ... t2296.html
Actually, the issue seems to be related to non-similar block size on the different levels of your SAN, I mean you may use a stripe size on your RAIDs, another block size for your VMFS, windows will use a third block size and cache is using 64k as block size, and as a results you get low write performance.
Solution is to use a similar block size on all your disks. And from our side we are now working on a new build which will allow to choose the block size for the cache. We will release it soon.
CyberNBD
Posts: 25
Joined: Fri Mar 25, 2011 10:56 pm

Tue Mar 29, 2011 11:26 pm

Some developments:

I did recreate the ESXi Target using Write-Back cache mode within starwind and also formatted the drive using 64K blocks within the virtual machine.

Using direct IO there is an improvement in both read and write speeds for transfer sizes up to 32K, the read performance dip for 128K also disappears.


Using non-direct IO there is an improvement on the write speed in general, although it's still far from what it should be.


Any thoughts?

Next step tomorrow will be to re-format all MD1000 drives on the StarWind box (these off course also use the standard 4k allocation size since i didn't change the setting) and thus recreating all targets from scratch. I'm curious on how this will affect performance on both MS and ESXi initiators.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Mar 30, 2011 9:14 am

Great! Moving in the right direction :))

I've been talking about Windows cache manager cache line size. Currently it holds up to 64KB so 64KB+ requests require two/three/more entries cache push/pop operations. We're re-working it to handle variable cache line size so there should not be processing overhead comparing to 64KB requests.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Mar 30, 2011 9:42 am

One question so far... If you increase Queue Depth from default 4 to say 8 or 16. What number do you start having?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
CyberNBD
Posts: 25
Joined: Fri Mar 25, 2011 10:56 pm

Thu Mar 31, 2011 4:19 pm

I did change the block size on the starwind box to 64K, formatted the iSCSI drives on the Virtual Machines using 64K block size and the array also has 64K stripe size.

Then I performed following benchmarks:
1. Starwind Cache normal, No TcpAck Regedit

2. Starwind Cache normal, TcpAck Regedit on the Virtual Machine which runs the MS iSCSI initiator. As discovered before, performance on the small transfer size increases.
(After that, also tried the TcpAck Regedit on the StarWind box but this slightly decreases performance using both MS and ESXi initiators, so I only left it on the MS iSCSI initiator VM.)

3. Time to do some testing using StarWind cache. Enabled it for all targets: Write-Back 512 MB. As before, little influence on the MS iSCSI initiator disks, big influence on the ESXi Initiator, but again, not nearly the performance you should expect.

The graphs:
MS Initiator:


ESXi Initiator:


So after all: using 64k block size has some influence, but still not the expected speeds on the hosts using ESXi initiator (especially the non-direct io tests).
Increasing QD from 4 to 8 also doesn't improve things much.
Last edited by CyberNBD on Thu Mar 31, 2011 6:45 pm, edited 2 times in total.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Mar 31, 2011 5:28 pm

1) You cannot change cache line size yourself. You need to wait for our engineers to change StarWind's Cache Manager to support this. So what you did is only half part of the work done, let us do our part for now.

2) I've been talking about altering Queue Depth size. ESX usually has at least 6-8 I/Os pending and request size of 64KB for read and 32KB for write. Make ATTO show I/O pettern results more or less close to your hypervisor load.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
CyberNBD
Posts: 25
Joined: Fri Mar 25, 2011 10:56 pm

Thu Mar 31, 2011 6:44 pm

1) OK

2) I also tested Queue Depth of 8 using atto (max is 10) on the ESXi initiator disk. I did add the graphs to the previous post.
anton (staff) wrote:1) You cannot change cache line size yourself. You need to wait for our engineers to change StarWind's Cache Manager to support this. So what you did is only half part of the work done, let us do our part for now.

2) I've been talking about altering Queue Depth size. ESX usually has at least 6-8 I/Os pending and request size of 64KB for read and 32KB for write. Make ATTO show I/O pettern results more or less close to your hypervisor load.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Mar 31, 2011 8:12 pm

Now I'm a little bit lost... What's the difference in what you do in "directio" and "io" graphs? (First ones are fine and second ones are not).

1) What's "normal" mode for cache? Write-Thru? It's not normal. Write-Back is default mode and Write-Thru is for non-HA (safety mode).

2) Why you allocate only 512MB of it? Do you have Windows machine running StarWind being short in memory and doing heavy paging under I/O load? This is *CRITICAL*.

3) There's no sense in testing any packet larger then 128-256KB. Windows breaks all I/Os to smaller size and the same happens with iSCSI. I don't mean they should not work or we're not going to do anything - just an opposite. Last graphs look weird and we'll definitely have to make them match the ones you've published before them.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
CyberNBD
Posts: 25
Joined: Fri Mar 25, 2011 10:56 pm

Thu Mar 31, 2011 9:17 pm

Direct IO means the test is performed without system caching. At first glance that should mean there's something wrong at the initiator system side but:
- when I use MS initiator I don't have this performance problem
- using other iSCSI target software i don't have the problem
- performing a clean OS install on VM using StarWind iSCSI target takes forever compared to local or other software target.

1) According to the StarWind software Cache mode Normal means no caching. This is the default setting when I create a target?

2) No specific reason for this. I have multiple targets so I did divide available system memory amongst them. I'm going to disable caching on the MS targets anyway because there's no difference in performance so I will test the ESXi target again using 2048MB? cache size. (StarWind box has 4 Gig memory for now).

3) Allright. Hope this can be solved. Reading some other topics it's not only me having these VMWare performance issues.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Mar 31, 2011 10:01 pm

1) How do you toggle "system caching" from ON to OFF? Is it... ESX or host Windows box related stuff?

2) If you HAD found config pattern matching your Windows-to-Windows numbers what made you look forward? In the other words: what exactly we're looking to fix?

OK, "Normal" means "No Cache".

Just wanted make sure you don't put target machine into swap conditon. When you do test I/O what target machine Task Manager tells about CPU/Memory usage (just to be sure)?

Absolutely true.
CyberNBD wrote:Direct IO means the test is performed without system caching. At first glance that should mean there's something wrong at the initiator system side but:
- when I use MS initiator I don't have this performance problem
- using other iSCSI target software i don't have the problem
- performing a clean OS install on VM using StarWind iSCSI target takes forever compared to local or other software target.

1) According to the StarWind software Cache mode Normal means no caching. This is the default setting when I create a target?

2) No specific reason for this. I have multiple targets so I did divide available system memory amongst them. I'm going to disable caching on the MS targets anyway because there's no difference in performance so I will test the ESXi target again using 2048MB? cache size. (StarWind box has 4 Gig memory for now).

3) Allright. Hope this can be solved. Reading some other topics it's not only me having these VMWare performance issues.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
CyberNBD
Posts: 25
Joined: Fri Mar 25, 2011 10:56 pm

Thu Mar 31, 2011 10:27 pm

1) It's an atto setting. You can check the "direct IO" box or not within atto before testing.

2) The other target software I tested was OpenFiler. Works very well and performs well out of the box but it but it doesn't support Windows 2008 R2 Clustering (something to do with persistent reservations etc) so that made me look for other solutions. I have no specific preference for Linux or Windows OS SAN software BUT is has to perform well for both Win and ESX(i) initiators. Some windows boxes are going to connect directly through MS initiator because of the Clustering. All other hosts (mixed Win and Linux) will use ESXi disks and thus ESXi iSCSI initiator.

Allright. Most of the time I have a taskmanager window open during testing but i haven't seen anything abnormal on the Starwind box. CPU usage around 0-3%, sometimes peaks to 10%. No swapping memory neither. Fills up nicely when testing using cache mode. Also took a look at the resource monitor (disk queue length etc) but also nothing abnormal there.

I'm testing using the 2048MB WB cache right now. Some improvement using atto. Clean W2k8 install is going faster compared to no cache but it still took about 40 minutes to complete where I'm expecting 15 to max 20 mins based on previous tests.
Post Reply