RAID configuration

Alexey Sergeev · Fri Feb 17, 2017 8:40 am

Hello,

I need some advice about configuration of RAID controllers on my servers which I'm planning to use as SOFS cluster for HA storage for Hyper-V.

Hardware:

2 x IBM x3650 M3

2 x E5506 4C 2.1 GHz
16 GB RAM
2 x PSU connected with 2 UPS

8 x 146 GB 10K 6 SAS (RAID 10)
8 x 300 GB 10K 6 SAS (RAID 5)
2 x ServeRAID 5015 on each server

We're planning to use RAID 10 for VM's OS and databases; RAID 5 would be storage for common file shares.

There is a "little" problem with BBU on controllers. Since these servers are pretty old it's look like batteries are exhausted because RAID configuration is in constant "BBU in relearn cycle".
Therefore we cannot use write-back policy with BBU. Either force write-back neglecting battery status or write-through which is already activated because of "relearning state".
It could be difficult to find new ones to replace them. What would you advise in such situation?

Alexey Sergeev · Fri Feb 17, 2017 1:19 pm

After updating firmware on RAID controllers I verified that BBU has indeed failed. So we can use only write-through policy (force write-back is too much of a risk, I think)
Is it supported configuration with Virtual SAN? Would RAM cache help in such situation?

Tue Feb 21, 2017 10:39 am

Hello Alexey,
With write-through policy on RAID controller, you will not get any boost on writes however, these settings are depending on your Production requirements.
I believe L1 cache in write-back mode can improve performance on writes in your case, but it should be checked additionally.
Please make sure that you follow StarWind Best Practices https://www.starwindsoftware.com/starwi ... ces-manual and align Stripe Size at 64K.

Alexey Sergeev · Fri Mar 03, 2017 7:15 am

I did a lot of tests using diskspd on both SOFS nodes and VMs. Could you take a look on the results, please?

I found it strange that Starwind iSCSI targets show different read performance without RAM cache and with it.
Actually the latter is worse. And local storage performs read operations much better.
Sequential writes are equal on local RAID and iSCSI targets. As well as random read/write (80/20) test.

Another question is results I've got with diskspd within VM guest OS.
One is located on local disks of our production Hyper-V server.
Another is placed on SOFS share which is built on Starwind HA image.
Now situation is a bit different: we have similar performance on HA VM with random read/writes and sequential reads, but writes are much slower than 'local Hyper-V' VM has.

RAID configuration we used on SOFS nodes:

RAID 10
8 SAS HDD 10K 146GB
Stripe size: 64KB
Disk cache: enabled
Read policy: read-ahead
Write policy: write-back
I/O policy: direct

RAID configuration of Hyper-V server:

RAID 10
4 SAS HDD 10K 300GB
Stripe size: 64KB
Disk cache: default
Read policy: read-ahead
Write policy: write-back
I/O policy: direct

Starwind device was created with 4096 bytes sector size and 1GB RAM write-back cache.
Logical disk allocation unit size is 64KB.

Fri Mar 10, 2017 4:30 pm

Hello Alexey,
Thanks for your efforts in testing.
Please provide more information about your configuration: what Networks you use for StarWind Synchronization, iSCSI and SOFS as well?
Please double check that each target is connected as it is specified in the "step by step guide":https://www.starwindsoftware.com/starwi ... -v-cluster
Also, try to change the MPIO policy for each server to "Failover only" or "Least Queue Depth", it can give different results.

Alexey Sergeev · Wed Mar 15, 2017 1:15 pm

After I created virtual disk using 512 bytes sector size instead of 4096 read performance became much better. But writes inside HA VM are still very slow.

SOFS nodes network setup:

1 x 1Gb - iSCSI
1 x 1Gb - Synch
1 x 1Gb - SMB 1
1 x 1Gb - SMB 2
1 x 1Gb - SMB 3

Not much, but that's all I've got for now. We've ordered a couple of 10 Gb NIC's for synchronization purposes and new Cat6 cables. Maybe it will improve situation.
Hyper-V hosts used converged network setup with 4 x 1Gb NIC's.

Perhaps with RDMA-capable hardware we could achieve better results, but what puzzles me is that when I run the same tests on the same servers in hyper-converged model I saw almost the same results. Sequential writes with 64KB block are 10 times slower than on our production Hyper-V server!
In hyper-converged setup I've used 3 x 1Gb NIC's for synchronization (straight connection) with fail-over and least queue depth - nothing changed.
I should notice that even VM placed on SMB share (without Starwind) perform twice better.

Thu Mar 16, 2017 12:07 pm

Hello Alexey,
Thank you for the provided information.
Performance inside VM depends on a lot of factors, mostly of its settings and disk type. Please check that VM disks are fixed as it is recommended by Microsoft.
Please check that both SOFS hosts have the same underlying storage performance, which should be equal to CSV performance.
Please keep in mind that write performance is limited by the synchronization channel performance, so it could be a bottleneck in your case.
If you will not identify a bottleneck, please do not hesitate to log a support case here https://www.starwindsoftware.com/support-form, thus we could review your systems together.