Performance Best Practices - V8 and Server 2012 R2

craggy · Wed May 21, 2014 2:27 pm

I know there are many tips and recommendations for various previous versions of SW and Server 2008 etc. but with the release of V8 running on Server 2012 R2 I'm sure things have changed.

Just a few questions that we encountered when testing V8 on Server 2012 R2.

What are the most recent recommendations and best practices for running V8 on Server 2012 R2 Bare Metal.
What network level tweaks are recommended in windows and ESXi for optimum iScsi performance?
What are the different tweaks recommended for those using 10Gbe or Infiniband over those using 1Gbe? (We never seemed to be able to achieve more than 400MBs with Multipath 10GBe for example).
What block size should we be using in SW Console for virtual disks, 512B or 4KB?
What is a recommended Cluster size when formatting NTFS volumes in Server 2012, 4KB, 8KB.. 64KB?
What is the recommended Raid block size for the underlying storage to optimise performance and capacity when the NTFS cluster size and SW Virtual Disk block size are factored in?
What are the recommended iScsi initiator settings and timings - and relevant tweaks to the iScsi target to match?

I'm sure people have many more questions but it would be great to have a central list of recommendations for people deploying V8 as a lot of these things are not easily changed after deployment of live data.

Klas · Fri May 23, 2014 7:50 am

Great initiative

What L1 cache size is recommended
What L2 cahce size is recommended
How many sync channels is optimal for X bandwidth. Example. 1 x 10 GBit iSCSI need 4 x 1 GBit sync. (In my lab with v6 and v8 beta I found that >3 x 1 GBit sync channels gave better performance than 1 x 10 GBit sync).
L1 and L2 cache contra Microsoft Storage spaces write back cache.

Sun May 25, 2014 9:54 am

We're busy with crafting a set of a technical papers on both dedicated (compute and storage layers separated) and converged (StarWind Virtual SAN and hypervisor are on the same hardware) scenarios. Sorry it takes a bit longer then expected. For now you ca re-cycle V6 ones as for VMware vSphere not that much changes (Microsoft Hyper-V changes are dramatic from the other side).

Infiniband is not natively supported yet (RDMA) and running IP-over-IB is slow. We do recommend a pair of 10 GbE or 40/56 GbE (Mellanox is great BTW) NICs for a backbone and ether 1 GbE or 10 GbE as uplinks. You should be able to saturate synchronization channels completely with a proper load on your storage cluster so a) upgrade to V8 and b) kick techies if something goes not as expected. We've improved sync channels performance and own multi path for them in V8 BTW.

4KB is recommended (hardware block size on a modern spindles and our internal block size with LSFS is also 4KB). However I'm not sure vSphere does anything then 512E so far.

Cluster size on a Windows host does not matter. NTFS does a very good job in a keeping image files sequential to avoid fragmentation so just make sure you don't run near full disk capacity on host.

Leave it to default. We have very big page sizes with LSFS so we'll cover many stripe sizes for RAID even with 1 or 2MB stripes.

iSCSI thing for initiators can be moved from V6. Again not much changed for THAT in terms of VMware (Hyper-V is different).

craggy wrote:I know there are many tips and recommendations for various previous versions of SW and Server 2008 etc. but with the release of V8 running on Server 2012 R2 I'm sure things have changed.

Just a few questions that we encountered when testing V8 on Server 2012 R2.

What are the most recent recommendations and best practices for running V8 on Server 2012 R2 Bare Metal.
What network level tweaks are recommended in windows and ESXi for optimum iScsi performance?
What are the different tweaks recommended for those using 10Gbe or Infiniband over those using 1Gbe? (We never seemed to be able to achieve more than 400MBs with Multipath 10GBe for example).
What block size should we be using in SW Console for virtual disks, 512B or 4KB?
What is a recommended Cluster size when formatting NTFS volumes in Server 2012, 4KB, 8KB.. 64KB?
What is the recommended Raid block size for the underlying storage to optimise performance and capacity when the NTFS cluster size and SW Virtual Disk block size are factored in?
What are the recommended iScsi initiator settings and timings - and relevant tweaks to the iScsi target to match?

I'm sure people have many more questions but it would be great to have a central list of recommendations for people deploying V8 as a lot of these things are not easily changed after deployment of live data.

Sun May 25, 2014 9:59 am

L1 cache = 5-10% of a served capacity

L2 cache = 10-20% of a served capacity

Use at least dual sync channels (we don't require any switch for a basic config so throwing in a pair of extra 10 GbE or 40/56 GbE is comparably cheap). Write performance of a LU is limited with sync channels performance so more you put in - faster you go. It's not normal 3xGbE >>> 1x10GbE so I think it may worth asking techies to take a closer look @ what you have.

(I'll reply to StarWind cache Vs. MSFT cache with a separate post on this thread)

Klas wrote:Great initiative

What L1 cache size is recommended
What L2 cahce size is recommended
How many sync channels is optimal for X bandwidth. Example. 1 x 10 GBit iSCSI need 4 x 1 GBit sync. (In my lab with v6 and v8 beta I found that >3 x 1 GBit sync channels gave better performance than 1 x 10 GBit sync).
L1 and L2 cache contra Microsoft Storage spaces write back cache.

Sun May 25, 2014 11:55 am

OK, now goes StarWind L1/L2 Vs. MSFT CSV/SS WB cache (keep in mind I'll talk clustered configs only, Storage Spaces features for clustered and non-clustered setups do differ!)

L1:

MSFT has limited-purpose block cache for CSV (anything that does not belong to CSV cannot be cached by it obviously). Initially MSFT had created CSV cache targeting CSV consuming apps (SQL Server and Hyper-V mostly) as they deal with VHD(X)s and DBs accordingly in a so-called "pass-thru mode" (file system cache provided by Windows Cache & Memory Mgr is bypassed and all I/O goes immediately to/from disk) for data integrity purposes. To mitigate a complete lack of a file system cache MSFT had brought CSV cache to table. Compared to file system cache is has one but major drawback - it's READ-ONLY. So it's does accelerate READs but has nothing to do with WRITEs. Moreover: it not only does not accelerate writes but it also does nothing to adsorb writes to keep underlying flash happy - all I/Os go immediately to flash basically burning it out (this is absolutely **CRITICAL** and is true for VMware Virtual SAN as well, that guys also don't use RAM as L1 cache and target all initial I/Os to flash only). Also CSV cache is not synchronized between hypervisor nodes so VM migrated from HostA -> HostB would start from a "cold" state which obviously affects performance. You'll notice this with every I/O hungry VM. CSV cache is not deduplicated even if served content has dedupe enabled on it. Results waste of RAM and poor ROI as same data blocks are stored in memory multiple times (for a file system cache MSFT has an ability to pinpoint dedupe buckets to memory so it's CSV cache problem and not MSFT problem in general). Hash protection is not used. You can have bitrot with CSV cache and you'll never know this. Even usin ReFS with integrity streams don't help here (there's no way to use ReFS data integrity with running VM images either way...)

StarWind has distributed write-back L1 cache. We don't care what you layer on top of it - Hyper-V CSV, cluster-aware file system like VMFS or expose raw LU to Oracle DB. We do support everything. Our cache does accelerate both reads and writes. Our cache is designed to adsorb writes and keep underlying flash (L2 or primary storage) as happy as possible with reduced amount of writes (same block overwritten multiple times would go once to flash with StarWind but as many times in MSFT case as app was actually writing it). We're synchronized between multiple nodes, keep content coherent so we're a) safe (there are multiple replicas of the same cached data) and b) VM moved from one hypervisor to other always starts from a "hot" state with all the cached data already in memory. Our L1 cache is deduped (if paired with LSFS as underlying storage of course and it also has dedupe enabled) so we can store in cache memory more data then normally expected. StarWind uses strong 256-bit signatures for cache blocks so if cache content is damaged we'll dynamically take cache away and replace with a data from an underlying nodes.

L2:

MSFT requires you to use SAS-attached flash to WB cache. SAS is a very poor choice here as it's both expensive (compared to commodity SATA SSDs) and slow (compared to Enterprise PCIe cards). Want to use huge loads of cheap and well-performing SATA SSDs from Intel for cache? MSFT cannot do it. Want to use that 1M IOPS 1TB Fusion-IO card for cache? MSFT cannot do it either. Next... SS Write-back cache can be configured once when building storage space. You cannot throw in more flash later to WB existing LU, you cannot remove cache from LU you don't want to cache anymore either. Cache size cannot be changed as well. You provide a settings for the pool but then virtual disk is created you're done. You cannot have cache settings being different from pool settings. So if you have mirrored pool - you have to go mirrored cache in a pool. So very inflexible... With Clustered Storage Spaces all nodes have access to the same content so caches are "distributed" (well, just because you have many paths to the same content) but again like with CSV cache MSFT has no dedupe option for WB cache. No hash protection.

StarWind can use any type of memory for cache. So you may have all-SATA primary storage pool for capacity paired with a PCIe-attached flash card for performance (that's actually recommended config for us). We can add / remove flash to LU any time you want. See need more IOPS? Throw in more cache. LU now handles ice cold data? Take away L2 cache and use it to accelerate something else. We can pair any types of cache protection with any underlying RAIDs. So you can use simple (no mirror) cache with RAID5/6 on a primary storage. StarWind L2 cache is distributed between multiple nodes so is safe to use and fast (number of I/O paths increase with every new replica added). Full dedupe ability when paired with dedupe-enabled LSFS config. Protected with hash sums.

As a side note we do provide users an ability to select between multiple policies "Write-Back for L1 and Write-Back for L2" Vs. "Write-Back for L1 and Write-Thru for L2". Second options uses RAM to accelerate reads and writes and adsorb writes and uses flash to accelerate reads keeping flash as healthy as possible.

So as you can see we beat MSFT on caching all-around. With V8 we don't recommend to use either CSV cache or SS WB cache. Use StarWind built-in stuff.

Klas wrote:Great initiative

What L1 cache size is recommended
What L2 cahce size is recommended
How many sync channels is optimal for X bandwidth. Example. 1 x 10 GBit iSCSI need 4 x 1 GBit sync. (In my lab with v6 and v8 beta I found that >3 x 1 GBit sync channels gave better performance than 1 x 10 GBit sync).
L1 and L2 cache contra Microsoft Storage spaces write back cache.

Tue May 27, 2014 8:01 am

Gentlemen,
Thank you for pointing out the important questions
I'll update our best practices guide shortly and make sure that these questions get into the document.

DUOK · Fri Jun 06, 2014 8:35 pm

anton (staff) wrote:L1 cache = 5-10% of a served capacity

L2 cache = 10-20% of a served capacity

Hi,

Anton, L1 Cache 5-10%, L2 10-20%, are you sure?

Serving 4TB, 200GB-400GB RAM needed and 400GB-800GB SSD...

Fri Jun 06, 2014 9:26 pm

Yes, I'm sure. You can go for less (of course) but numbers I gave are optimal for performance.

DUOK · Sat Jun 07, 2014 10:43 am

In a paper by Max & Bodhan, Starwind wrote:

(http://www.starwindsoftware.com/starwin ... -practices)

"It is recommended to provision 256MB–1GB of caching per each terabyte of the HA device’s size. Although, the
maximum recommended cache size is 3GB. For most scenarios, bigger cache is not utilized effectively."

This paper is created last year, I guess to Starwind v6.

For planning, what's the optimal recommendation?

Is this?
Starwind v6 - Serving 8TB - L1 3GB RAM
Starwind v8 - Serving 8TB - L1 400GB-800GB RAM
¿L2 cache v6 & v8 - 800GB-1600GB SSD ?

Anton, sorry to insist, but seem very different configurations.

Thanks for your support,

Sun Jun 08, 2014 10:14 am

V6 and V8 are quite different beasts. V6 had no L2 cache @ all (V8 has one), cache polices are different and cache serving algorithms are heavily modified. Also V8 has an option of a deduplicated cache (when paired with LSFS) making things totally different: basically V8 has two independent L1 caches when running with LSFS.

barrysmoke · Tue Jun 10, 2014 8:53 am

Even usin ReFS with integrity streams don't help here (there's no way to use ReFS data integrity with running VM images either way...)

ReFS in a windows hyper-v scenario could be at multiple layers:
Storage Pool for main RAID0/10 array could be ReFS.
starwind iscsi target could be mounted, and formatted ReFS
the VM's boot disk has to be ntfs, however any secondary disks could be ReFS

can you elaborate on your above statement about no way to use data integrity with running VM images?

LSFS is using strong checksums so ReFS is not recommended (waste of CPU). Also ReFS is not stable enough (at least yet).\

another Anton quote from another thread : http://www.starwindsoftware.com/forums/ ... tml#p20033

there are more advantages to ReFS, such as built in self-healing, and not having to run chkdsk on very large volumes(could be days of downtime in some circumstances)
I had a 5TB raid array under windows go corrupt here recently, and that has me thinking about ReFS again. This was not under starwind, but is about the 4th or so corruption I've experienced in the past few years.

Fri Jun 13, 2014 1:29 pm

Barry, thank you for the input!

barrysmoke · Sun Jun 15, 2014 3:47 am

I found some links talking about integrity streams, and running vm's, so the article reccomended not using ReFS on your hyper-v servers, where they store running vm's.
does this extend to the vhdx container files starwind uses on storage pools, or is this just limited to vm's running on ReFS?

Another question that comes to mind then, is in a vmware environment. Could ReFS be used to house a starwind storage pool, with iscsi target vhdx files shared out to vmware, formatted vmfs.

Sun Jun 15, 2014 12:17 pm

This means you can use ReFS to host StarWind containers (FLAT and LSFS) up to the point when (if?) we'll switch to RAW. But don't use ReFS to format your CSV as it's absolutely pointless - dedupe is not supported, integrity for VHDX is not supported, performance of NTFS is better.

barrysmoke wrote:I found some links talking about integrity streams, and running vm's, so the article reccomended not using ReFS on your hyper-v servers, where they store running vm's.
does this extend to the vhdx container files starwind uses on storage pools, or is this just limited to vm's running on ReFS?

Another question that comes to mind then, is in a vmware environment. Could ReFS be used to house a starwind storage pool, with iscsi target vhdx files shared out to vmware, formatted vmfs.

robnicholson · Tue Jun 17, 2014 6:42 pm

DUOK wrote:
anton (staff) wrote:L1 cache = 5-10% of a served capacity

L2 cache = 10-20% of a served capacity

Hi,

Anton, L1 Cache 5-10%, L2 10-20%, are you sure?

Serving 4TB, 200GB-400GB RAM needed and 400GB-800GB SSD...

Can I re-visit this one. These figures seem incredibly high guys! 4TB is actually a tiny amount of disk space these days. We've got 8TB on our current primary storage. Starwind default was 128MB but we increased it to 8GB. So for a 20TB disk system, you're saying you need 5% minimum RAM cache for any kind of reasonable performance. Err, guys - who has 1TB RAM in their SANs??

I love StarWind but lack of real world guidance on sizing is hurting you.

Cheers, Rob.