new lab installation and sync speeds seem low: 200 to 300 mbps

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
Benoire
Posts: 25
Joined: Mon Jan 08, 2018 8:13 pm

Tue Mar 06, 2018 12:01 am

Hi

I'm just setting up my starwind environment in my lab, having moved from VMware vSAN. I've got two identical Dell R710s with 40GB Ram, quad 1 GBE NIC, 1 x 10GBe direct connected NIC. Local storage at this stage is simply an enterprise 2TB Sata drives and 240GB SSD in each machine. Local storage is connected via SAS2 HBAs.

I'm playing with various storage approaches, including RDM passthrough.. Currently its configured so that I'm passing VMDKs in to the starwind hosts.

With both machines setup with jumbo frames and 8GB Ram, iSCSI traffic, sync and heartbeat are on the 10gbe and heartbeats using 2 other NICs, with the final NIC serving management and general VM access on the network I'm getting what appears to be a low intial sync speed.

I'm not expecting blistering performance with the sata drives, but I do know they are capable of more than 120 MB/s in sequential read/write... The issue I've got is that looking at the network performance from task manager the 10gbe network is not going anywhere above 300mbps when syncing the empty devices between the two nodes... Does this seem correct or a tad low? I was expecting to get closer to the 1gbps mark as apparently the VMDKs are not supposed to be that much slower than raw disk access but at the moment they're not syncing anywhere near that figure.

I might try and create a new HA image directly on the SSDs and see how that performs, but I'm curious for thoughts on the current setup. I will benchmark each set of drives with a local vm and see how they perform outside of the HA setup.. That should confirm if there is an underlying storage speed issue.
Benoire
Posts: 25
Joined: Mon Jan 08, 2018 8:13 pm

Tue Mar 06, 2018 9:11 am

So I wiped the image and started again. This time I IO tested the VMDKs and got a reasonable sequential speed close to what I was expecting for both the HDDs and the SSD.

I've now created a 100GB image on the SSD and replicating to the other SSD... Windows is reporting 3.9Gbps via the 10gbe network connection which seems reasonable. Will try the HDD without flash cache and see what that does agian.

Edit 1:

Creation of a 100GB HAimage on the 1.8TB HDD was synced at over 900mbps peaking at 1.1gbps as anticated, so not sure why the large image took so long. Does a larger image take more effort to sync therefore dropping sync speed?
Benoire
Posts: 25
Joined: Mon Jan 08, 2018 8:13 pm

Tue Mar 06, 2018 10:00 am

I'm going to document my testing in this thread now!

So deleted the previous two images, created a new HA image @ 1.75TB with 90GB flash cache and it is now syncing at 900mbps; much better than earlier!

I deleted the second heartbeat only link and now running single 10gbe sync (plus heartbeat) and 1gbe heartbeat... That might have been issue as one of the other heartbeats was on the internet facing subnets, so completely different link and that might have been causing issues.

Will let this sync through and then I'll delete and pass the entire disk via RDM in to the VMs and compare performance with vmdk vs rdm; after which I'll then run the HDD on storage spaces simple and see how that affects throughput.
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Tue Mar 06, 2018 2:48 pm

Benoire,
You did really impressive investigation :)
Actually, you did not mention about RAID arrays for the hosts, you can find recommended settings here.
Also, you can find step by step guide here.
Benoire
Posts: 25
Joined: Mon Jan 08, 2018 8:13 pm

Tue Mar 06, 2018 6:38 pm

Hi Oleg

Thanks for the reply. I'm used to running vmware vsan which worked well but required minimum of 3 hosts and my power bill wasn't liking that so needed to find an alternative.

Presently I'm not running raid on these machines, I'm just testing with single drives and cache until I'm happy with performance, then I'll start to roll out raid 0 arrays to maximise the speed.

As a note, I went to bed last night with the sync an hour in on the 1.75tb drives, which the sync speed at 900mbps; got up now and its now at 150mbps with 7 hours remaining after 8 hours already done.

Are there any logs that would tell me why the sync speed would reduce by so much? I'll leave this to finish and then run benchmarks on the performance before killing and carrying on as I said earlier.
Benoire
Posts: 25
Joined: Mon Jan 08, 2018 8:13 pm

Tue Mar 06, 2018 11:06 pm

So I've now passed through the discs via RDM to the VMs.

The SSD HA image synced at 3.9mbps which is pretty much the max speed of the SSD, so fine there.

The HDD was setup in storage spaces with simple spaces setup and one drive. a 100gb image was then created on the new drive and was synced at 1.1gbps, so 200mbps or so higher than the VMDK in terms of speed, again like the SSD this was completed without issue.

I've now got a 500GB image being synced. Again, the speed is the same 1.1gbps but I'm curious to know where the speed drop outs occur.

Edit: The 500GB image synced in 1 hour and 12 minutes and only dropped to 950mbps.. Will try expanding the image and see if degradation in performance occurs.
Benoire
Posts: 25
Joined: Mon Jan 08, 2018 8:13 pm

Wed Mar 07, 2018 1:47 am

So performance of the drives via RDM/Storage Spaces was fine and no issues. IO and IOPS seemed aligned with the speed of the drives, will no roll out a full cache + storage array and see how that will perform.
PoSaP
Posts: 49
Joined: Mon Feb 29, 2016 10:42 am

Wed Mar 07, 2018 4:57 pm

Wow, you did a great job.
So performance of the drives via RDM/Storage Spaces was fine and no issues. IO and IOPS seemed aligned with the speed of the drives, will no roll out a full cache + storage array and see how that will perform.
Can you give more details? How did you make the configuration of Storage Spaces? Was it GUI or PowerShell? How did you RDM HDD and SSD drives into VM? What is VM configuration?
I am asking these questions because I plan to config my environment in a similar way.
Last edited by PoSaP on Thu Mar 08, 2018 9:39 am, edited 1 time in total.
Benoire
Posts: 25
Joined: Mon Jan 08, 2018 8:13 pm

Wed Mar 07, 2018 9:14 pm

Hi Posap

So firstly I'm using VMWare ESXi 6.5 with vCenter 6.5 too, with Hyper-V its a little easier to get access to the drives but I prefer ESXi.

To allow RDM access to the drives you either need an HBA that can act as a JBOD device, I use Dell H200s on my Dell R710s but a 9211-8i or similar flashed to IT mode would offer the same outcome, or local SATA. Essentially, these HBAs in IT mode don't really do raid at all at a hardware level and simply present the drives to the OS / Hypervisor as local storage.

RDM is easier to access with HBAs as there is a tick box in the advanced settings of ESXi to allow RDM passthrough (Host Config>Advanced System Settings>RdmFilter.HbaIsShared = false), this will then allow you to simply add the drives as RDM devices directly to the VM. Local storage via SATA is harder as you either have to pass the entire Sata interface in to the VM which will normally mean ALL of the attached drives and therefore no where to store the base VM or you need to go in to the cli (https://gist.github.com/Hengjie/1520114 ... 7af4b3a064) to generate an RDM 'stub' for the VM.

Once you've got the drives in to the Windows VM, I simply used the GUI to create everything as I'm just using simple spaces with a single drive. If I need more then I'll use powershell to add the new drive and setup striping...

VM config is server 2016 with 8GB Ram, a single 100GB boot drive and then the two vSAN drives.

Network is 10GBe direct connect and then multiple 1Gbe network connections to support VM management and heartbeat.
PoSaP
Posts: 49
Joined: Mon Feb 29, 2016 10:42 am

Thu Mar 08, 2018 2:37 pm

Great, I will try the similar scenario.
But I have one guess. What will happen if one of the physical drives fail?
Benoire
Posts: 25
Joined: Mon Jan 08, 2018 8:13 pm

Thu Mar 08, 2018 6:43 pm

If a drive fails on a single host, then Starwind will move all operations to a single feed from the other working VM. Once you've replaced the failed drive, starwind will then sync from the remaining host to the new drive and then bringing it back online and usable.

Starwind vSAN can be considered as RAIN and not RAID, so each node becomes part of the array; the more nodes = more redundancy, more nodes = more iops and bandwidth. Clearly, you can also run a RAID 0/1/5/6/10 etc. on the node itself to avoid having to rebuild the node from scratch. I'd prefer to use the Linux appliance here instead of the windows VM to get access to BTRFS or even ZFS on Linux as that would provide a decent RAID speed compared to windows as well as bitrot protection, unfortunately the VSA appliance is not compatible with my Xeon cpus at the moment and I'm waiting for v2 to see how different it is.
Oleg(staff)
Staff
Posts: 568
Joined: Fri Nov 24, 2017 7:52 am

Fri Mar 09, 2018 11:10 am

Hi Benoire,
You did a lot! :)
Thank you for the good words about StarWind.
Linux StarWind VSA is in the testing stage now and should be available soon.
Post Reply