Wed Mar 02, 2016 3:47 pm
All,
Here are some observations/comments regarding my move from standalone servers running Starwind in HA to a roll-my-own Hyperconverged setup using Supermicro servers, VMWARE, and Starwind. Operating system hosting the Starwind software and the test virtual machine are Windows 2012 R2 (latest patches installed), separate VMs on the same ESXi host. All disks in Windows were formatted with NTFS and 64K allocation units. The targets are using 5GB WB L1 cache using a thick-image file, No LSFS.
In all tests, I used the Starwind recommended access profile in IOMeter, which uses 4 workers per target disk defined as follows:
5 minute runtime
30 second ramp up
Access specifications (see the 4 defined below)
Maximum sector size = 20000000
# of outstanding IOs = 16
All other options set to default.
This test was run directly on the VSA. The VSA has an E drive presented by an ESXi host. The disk is a standard independent virtual disk which uses the paravirtual adapter. The VMFS volume that hosts the disk presented to the local VSA sits on 5 1TB Samsung Evo 850 SSDs in RAID0 configuration provided by an Avago Megaraid 9361-8i. The RAID0 configuration is set to Write-Through mode, uses a 64K stripe, Direct IO, and No read ahead. The Avago controller is licensed for Fastpath and Cachecade. This VSA configuration is identical on its HA partner node.
'Target Type Target Name Access Specification Name # Managers # Workers # Disks IOps Read IOps Write IOps MBps (Decimal) Read MBps (Decimal) Write MBps (Decimal) Average Response Time
ALL All 4 KiB aligned; 100% Read; 100% random 1 4 4 210055.4378 210055.4378 0 860.387073 860.387073 0 0.304511
ALL All 4 KiB aligned; 0% Read; 100% random 1 4 4 184672.0686 0 184672.0686 756.416793 0 756.416793 0.346341
ALL All 64 KiB; 100% Read; 0% random_ 1 4 4 37752.90699 37752.90699 0 2474.174513 2474.174513 0 1.69497
ALL All 64 KiB; 0% Read; 0% random_ 1 4 4 33301.486 0 33301.486 2182.446187 0 2182.446187 1.921554
The following test was run locally on a test virtual machine against the C drive, which is hosted on an HA target provided by the VSAs.
'Target Type Target Name Access Specification Name # Managers # Workers # Disks IOps Read IOps Write IOps MBps (Decimal) Read MBps (Decimal) Write MBps (Decimal) Average Response Time
ALL All 4 KiB aligned; 100% Read; 100% random 1 4 4 124257.61 124257.61 0 508.95917 508.95917 0 0.51479
ALL All 4 KiB aligned; 0% Read; 100% random 1 4 4 37330.34285 0 37330.34285 152.905084 0 152.905084 1.714098
ALL All 64 KiB; 100% Read; 0% random_ 1 4 4 43402.23004 43402.23004 0 2844.408548 2844.408548 0 1.474266
ALL All 64 KiB; 0% Read; 0% random_ 1 4 4 19335.46808 0 19335.46808 1267.169236 0 1267.169236 3.309597
The following test was run locally on a test virtual machine against the C and E drive, which are hosted on two separate HA targets provided by the VSAs. Each of the HA targets is located on the same 5 disk RAID0 1TB SSD RAID setup.
'Target Type Target Name Access Specification Name # Managers # Workers # Disks IOps Read IOps Write IOps MBps (Decimal) Read MBps (Decimal) Write MBps (Decimal) Average Response Time
ALL All 4 KiB aligned; 100% Read; 100% random 1 8 8 178731.144 178731.144 0 732.082766 732.082766 0 0.715903
ALL All 4 KiB aligned; 0% Read; 100% random 1 8 8 55935.78807 0 55935.78807 229.112988 0 229.112988 2.288018
ALL All 64 KiB; 100% Read; 0% random_ 1 8 8 58966.77633 58966.77633 0 3864.446654 3864.446654 0 2.170342
ALL All 64 KiB; 0% Read; 0% random_ 1 8 8 23573.25393 0 23573.25393 1544.89677 0 1544.89677 5.429488
Starwind confirmed that there is currently an iops limit per LUN, and I was able to see increased performance when I run the same test against two targets rather than just a single target.
I noticed slightly better performance and lower cpu utilization using paravirtual vs lsi storage adapters in the VMs.
It appears that the 5GB Write-back cache on the HA targets speeds up 64K, sequential reads achieve greater throughput than the native numbers for the test.
100% random writes at 4K take a beating with Starwind. 100% sequential writes take a smaller hit.
Overall, I'm happy with the solution so far. Support really took the time to verify the setup and benchmark.