StarWind VSAN - Performance drops when replicating

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
walkablenormal
Posts: 2
Joined: Mon Nov 18, 2024 3:41 pm

Mon Nov 18, 2024 4:08 pm

Hi,

I’m currently testing StarWind VSAN appliances on VMware to provide shared storage for a cluster.

The setup consists of two CVM appliance nodes running on VMware ESXi. These appliances use all-flash storage and have virtual disks attached for their shared storage pool. The ESXi hosts are connected via 10Gbps networking, with data and replication traffic segregated across separate distributed virtual switches and interfaces.

During testing, I observed that write performance starts very fast but drops to a sustained 50MB/s after approximately 300MB of data is written.

What I’ve tested:
• Network performance: Verified using iPerf3, achieving ~8Gbps on a 10Gbps link.
• Disk performance: Tested with DD and hdparm. Performance of writes is around 300-500MB
• Network anomalies: Checked for packet drops using ethtool. No drops etc.

What I noticed:
When I shut down the second VSAN node, write performance improves significantly, increasing to 200-300MB/s. However, once the secondary node is restarted, write speed drops back to 50MB/s.

I can’t pinpoint why the performance decreases so drastically when replication is active. Any suggestions on what I should check or adjust to resolve this issue would be greatly appreciated.
yaroslav (staff)
Staff
Posts: 3219
Joined: Mon Nov 18, 2019 11:11 am

Mon Nov 18, 2024 6:27 pm

Welcome to StarWind Forum.
Distributed vSwitches are not supported option for iSCSI and Sync. See the guides (https://www.starwindsoftware.com/resour ... al_papers/) for more details and VSAN best Practices (https://www.starwindsoftware.com/best-p ... practices/). Check system requirements too (https://www.starwindsoftware.com/system-requirements).
Try connecting storage as RDM or pass-through to StarWind VSAN VM.
Replication consumes write speed as effectively the system is double-writing.
May I wonder how you measure the performance and what is the RAID level used?
Good luck with your tests!
P.s. Try windows VM and Windows-based VSAN
walkablenormal
Posts: 2
Joined: Mon Nov 18, 2024 3:41 pm

Tue Nov 19, 2024 9:54 am

yaroslav (staff) wrote:
Mon Nov 18, 2024 6:27 pm
Welcome to StarWind Forum.
Distributed vSwitches are not supported option for iSCSI and Sync. See the guides (https://www.starwindsoftware.com/resour ... al_papers/) for more details and VSAN best Practices (https://www.starwindsoftware.com/best-p ... practices/). Check system requirements too (https://www.starwindsoftware.com/system-requirements).
Try connecting storage as RDM or pass-through to StarWind VSAN VM.
Replication consumes write speed as effectively the system is double-writing.
May I wonder how you measure the performance and what is the RAID level used?
Good luck with your tests!
P.s. Try windows VM and Windows-based VSAN
Hey!

Thanks for the swift reply!

I know I go against multiple of the best-practices set by StarWind. But I really want to fit this solution into the current vSphere setup I'm running.

- dvSwitches are used because I need to reuse uplinks that are currently used. I know that this goes against best-practices These are 10Gb uplinks that arent congested at all by current workloads. If iPerf is to believed I don't think the network should be an issue. Data and replication networks use seperate uplinks. What is the reason dvSwitches arent supported and vSwitches are favoured?

- The storage that backs the vSphere environment is full NVMe flash and is *really* fast. Not congested by any currently running workloads Its backed by RAID-5, which has a write-penalty. But when I bypass Starwind VSAN or make it unable to do replication, the speed skyrockets. So I have a tough time blaming the storage for the write-speed decrease.

I measure performance by copying some large files on the same disk. I use a Windows VM for this that has a disk mounted that is on the datastore I'm testing.

- Starwind VSAN (repl. enabled) - high perf. start but sustained 50MB. write.
- Starwind VSAN (repl. disable) - sustained 200-300MB write.
- non-starwind - sustained 300MB write.

I'm willing to try the Windows VM running the Starwind software. Can you provide me with a download for the free version?

With kind regards,
yaroslav (staff)
Staff
Posts: 3219
Joined: Mon Nov 18, 2019 11:11 am

Tue Nov 19, 2024 10:02 am

Thanks for your update. Iperf shows the bandwidth. It is bad to show what happens to the network under load.
Copying is a bad test as it involves buffers. First, you need to measure the underlying storage performance. Then, you need to run the cumulative tests. See more at https://www.starwindsoftware.com/best-p ... practices/.

The link for StarWind VSAN Windows-based service is available in the email you get when downloading the software https://starwindsoftware.com/tmplink/starwind-v8.exe.
FlashMe
Posts: 17
Joined: Wed Aug 24, 2022 8:23 pm

Sun Nov 24, 2024 10:12 pm

Hey :)

What is the VM configuration for your Test ? You need a Test VM with:

8vCPU
4 - 8 GB RAM
Thick Eager Zeroed Disk for Test
NVME Controller

For NVMe 300 MB/s is really poor. I saw much Higher values Like 2 GB/s and more on systems with raid-5.
yaroslav (staff)
Staff
Posts: 3219
Joined: Mon Nov 18, 2019 11:11 am

Mon Nov 25, 2024 12:37 am

Hi,

Very good point! Thanks for your comment.
amorapotter
Posts: 1
Joined: Tue Jan 14, 2025 4:35 am

Tue Jan 14, 2025 4:42 am

If you're using synchronous replication, each write operation must be acknowledged by both nodes before being committed. This could be the cause of the reduced performance, especially if the network latency or write confirmation process is slow.
yaroslav (staff)
Staff
Posts: 3219
Joined: Mon Nov 18, 2019 11:11 am

Tue Jan 14, 2025 8:34 am

Hi,

Yes, but reduction is normally not that huge.
Post Reply