Hello all.
We're in the process of setting up a new 2-node (plus witness) proxmox deployment to replace a vmware deployment.
We're just using the free license for starwind.
I've followed the guide and was able to setup a 2-node starwind HA cluster with replication/sync working, but we had very poor performance so I want to check some things here.
Background:
We have two Lenovo SR630 V3 servers, specs:
1x Xeon Silver 4514Y 16 core cpu
64GB Ram
ThinkSystem M.2 RAID B540i-2i SATA/NVMe
--Above controller has two 480Gb enterprise nvme SSD's in a RAID mirror, this is the OS drive for proxmox
ThinkSystem RAID 9350-8i 2GB Flash PCIe 12Gb Adapter
--Above controller has 4x 7.68TB SATA enterprise SSD's
Broadcom NX-E PCIe 10Gb 2-Port Base-T Ethernet Adapter
Broadcom 57416 10GBASE-T 2-port OCP Ethernet Adapter
Proxmox is 9.0.10.
Starwind CVM is deployed on the local nvme drive on each host, and are each assigned 8 cores, and 8GB ram.
The two hosts have direct connections between them using 10GB nic's for replication, and another set for sync.
Those two networks are all mtu 9000
They have another 10GB nic going to our core switches for general traffic
we've setup the iommu type options on the host bios and in proxmox, and we're passing through the 9350 raid controller on each host to the starwind machine
We don't have super high performance requirements, we're mostly just running windows VM's like domain controllers, files servers, and some small applications but databases with high i/o requirements.
if I run iperf3 between the starwind machines on the replication and sync networks, I get 9.8-9.8Gb/s between the two machines, 30+ within the same machine.
So I think the networking side is all fine.
Initially, I created a pool in Starwind with all 4 drives, using software raid 5 (we want the space more than the highest performance). I then created a volume and lun.
I ran the iscsi commands on each proxmox hosts, added the iscsi lun, and confirmed multipath was running with multipath -ll, showing both connections.
After the starwind boxes finished their initial sync, we installed Windows server 2025 in a guest VM, stored on the starwind lun.
The guest we used OVMF/Uefi, q35 type, virtio-scsi single, 4 cores, 8GB ram. Once windows was installed, we installed the quemu agent and virtio drivers.
We quickly noticed the VM's were running really slow.
I ran winsat to test read and write speeds to the drive. We were seeing write speeds around 1.5-3MB/s, and read like 150-250MB/s.
I tried different options for the VM's disk, like writeback cache, which helped a bit, but still in that same range.
I then moved that VM to the nvme volume, and got read and writes in the hundreds of MB/s.
I removed that iscsi lun from proxmox, deleted the lun, volume, and pool in starwind.
I rebooted the host, and went into the 9350's setup and created a hardware raid volume there
Once booted up again, I went into starwind, which still has that raid controller passed to it, and now I see just the 1 big volume instead of the 4 drives.
So I created a pool on that, then volume, and lun, and re-added it to proxmox again.
I moved the VM over to this volume, and now my write speeds are great, 600+MB/s and 350+MB/s read. So this would work fine for us, if this is an approved method.
I also tested going back to no hardware raid, and in starwind I used all 4 disks to make a ZFS volume. This one had performance around 200MB/s read, and 10MB/s write.
So my questions are:
-Does this seem accurate that software raid 5 would be suffering so poorly?
-Is this a recommended setup as it is now with a hardware raid 5 array created on the host?
-if it's not supported, is there anything I can try to improve performance with either software raid or zfs?
-Is ZFS preferred? and would I be doing this just in starwind when creating it (custom, zfs)?
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software