how to set up vsan with hyperv?

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Fri Jan 19, 2024 6:45 pm

yaroslav (staff) wrote:That's quite a project if it involves cross-hypervisor migration.
Good luck! Keep me posted.
Have a nice weekend.
fortunately, most of the VMs had minor changes. i had no luck trying to migrate directly, instead ended up creating new VMs and configuring them to match the old ones. Yeah, a lot of fun :)
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Fri Jan 19, 2024 8:40 pm

Sorry to read that. Well, that's how often cross-hypervisor migration [without convertors] ends.
Good luck with the projects :)
danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Sun Jan 21, 2024 10:27 pm

Well, ZFS is a non-starter. If this was not HA, I'd say screw it and set sync=disabled, since a crash will take the client out as well as the storage (just like pulling power cord). Unfortunately, in the HA storage scenario, since sync=disabled causes ZFS to lie to the iscsi daemon, the remote client will believe the write went to stable storage. If the hyper-v host (or the VSA!) crashes after the ACK to the iscsi client, but before the 5-second flush to the pool, starwind code will believe that block X on host A is up to date, whereas it never actually got written :(. I did experiments both ways (using an omnios VSA), and get approximately this:

sync=disabled
read = 1000MB/sec write = 500MB/sec

sync=standard (default)
read = 1000MB/sec write = 100MB/sec <=== this is no better than a consumer grade SATA spinner!)

My next thought was SS "raid10", but it isn't really raid10 - the only guarantee you have is every chunk will be replicated on another disk, so unlike real RAID10, where a 2nd disk failing is OK unless the it was the partner of the already failed disk, you are almost certain to lose some data with SS. So, since I have only about 250GB of VM disks, I intended to create 2 SS mirrors, one for local storage (which will house a domain controller), and one for starwind cluster storage.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Sun Jan 21, 2024 10:59 pm

Hi,

First of all, thanks for your detailed feedback and the time you spent on this case.
I often see users use individual NVMe disks over some form of software RAID (e.g., mdraid, vROC (this is a separate can of worms to open from my experience, yet fun technology), or StorageSpaces). In that case they have no redundancy on the server level but rely on the StarWind HA mirror. Sad to have no redundancy in the box, yet a viable scenario IMO.
danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Mon Jan 22, 2024 12:04 am

Always glad to provide info that might help someone...
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Mon Jan 22, 2024 4:07 am

Thanks. I highly appreciate it.
Let me know if there is anything else I can help you with.
danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Wed Jan 24, 2024 4:29 pm

Well, I am stumped. I'm running into completely hopeless network performance before even trying starwind VSAN (I wanted to test performance first).

My setup:

two 2.1ghz xeon servers with 2 and 1 socket respectively - each has ws2019 with hyper-v role active.
each server has a 2-port intel 10gb (x-710/da2) which I had been using for proxmox for the 3 node setup.
I recently purchased 2 intel 40gb (x710/qda1) cards, connected back to back for the sync link.

When I download and run iperf3 (doesn't matter who is client and who is server), I get dreadful performance. I have found numerous articles talking about disabling RSS, VMQ, etc, but nothing I have tweaked has helped (jumbo frames didn't either). I have a separate physical host with ws2019 as well, with a single 10gb (intel x710/da2) to the same switch as the other 2 hosts. I have run all combinations of 10gb tests and see substantially worse performance with the hyper-v hosts. I did also try sr-iov + deleting the vswitch - no help.

The omnios SAN/NAS physical host was used also, so I could completely eliminate anything windows-related, and saw much better 10gb performance.

I then booted the veeam host off a clonezilla live USB and did a couple of tests. Results:

10gb link (intel x710-da2) / Ruckus switch

omnios => clonezilla 9 gb/sec
clonezilla => omnios 6 gb/sec

omnios => ws2019 9 gb/sec
ws2019 => omnios 6 gb/sec

omnios => ws2019/hv 3-5 gb/sec
ws2019/hv => omnios 4 gb/sec

ws2019 => ws2019/hv 5 gb/sec
ws2019/hv => ws2019 5 gb/sec

40gb link (intel x710-qda1) / point to point

ws2019/hv => ws2019/hv 4 gb/sec <=== WTH????

ws2019 means physical host that has veeam installed.
ws2019/hv is a physical host with hyper-v role.

I guess I can live with ws2019 getting half the performance of linux, but the 40gb performance?! Holy hell, 10% of the raw throughput? I will admit I didn't test the 40gb without windows since I have guests running on both hyper-v nodes, and I would need to reboot both of them off linux. I can try that later tonight when no-one is on.

This has been extremely frustrating, as there are tons of articles on the web recommending this or that tweak, none of which seem to make any difference.) I'm honestly kinda shocked that windows server 2019 comes with default tuning this awful. I could live with mediocre 10gb performance, but the whole point of getting the 40gb cards was for starwind sync, and 4gb will give me sucky performance. I even spun up a couple of debian guests and passed in the sr-iov 40gb card to each of them and still got lousy performance, which makes me wonder if windows is still getting in the way?

If I can't get acceptable performance out of the network on windows server 2019, I think I'll have to look elsewhere (don't want to go back to vsphere due to the broadcom fiasco), and i'm not fond of proxmox. Maybe nutanix or something else? Sigh...
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Wed Jan 24, 2024 5:50 pm

Hi,

That's quite frustrating. You know, let us have a look into it together. Just want to poke around it myself to see if we can make it any better.
Please log a call with us by writing to support@starwind.com use 1101318 and this thread as your references.
danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Wed Jan 24, 2024 5:57 pm

sure thing. i can actually test the 40gb link with 2 linux usb live installs this afternoon.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Wed Jan 24, 2024 6:18 pm

Keep me posted.
Thanks.
danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Wed Jan 24, 2024 10:37 pm

Interesting. I thought it might be a bad 10gb card, or maybe the cable, so I dug in my parts bin and found a couple of connectx-5 (50gb) cards and cable. Put those in and ran linux usb on both hosts. I get about 25gb/sec. Interestingly, setting jumbo frames hurt a bit (20gb/sec). Booted ws2019/hv on both boxes and re-ran iperf3. Instead of 4gb/sec I get 10gb/sec. Still not great, but ok, I guess. Getting 40% of linux is still not good, so I will email support as you asked, since there must be some tuning that isn't right.

p.s. My return window to amazon for the 710-qda1 cards is a week from today, so they're going back ASAP!
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Thu Jan 25, 2024 1:00 am

I had similar problem with x710s other day where it was just not syncing anything at all in WS 2022 with the Intel's drivers :(
Try with the default driver. Wrong cabling (i.e., if it is a bottleneck) may often show up on PowerShell or ESXi physical adapter outputs.
I have to admit it was really tricky one. I am happy to carry on in DM (small exclusion). The same story about the cable.
danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Thu Jan 25, 2024 1:12 am

yaroslav (staff) wrote:I had similar problem with x710s other day where it was just not syncing anything at all in WS 2022 with the Intel's drivers :(
I have to admit it was really tricky one. I am happy to carry on in DM (small exclusion).
well, I've already requested a return to amazon :)
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Thu Jan 25, 2024 1:17 am

That was fast :)
Hope you get a new one fast too!
danswartz
Posts: 71
Joined: Fri May 03, 2019 7:21 pm

Thu Jan 25, 2024 1:18 am

indeed :) well, when I found I could get decent performance from the mellanox cards... I would like to figure out why ws2019 gets 10 gb/sec but linux gets 24 gb/sec.
Post Reply