New Cluster Slow and Can't Sync
Posted: Tue Jul 20, 2021 12:39 am
Hello! Following my recent support request, I'm setting up new servers, one, for a speed bump, and two, to reconfigure my networking to vSAN's recommendations. The issue I'm having is in testing and preventing me from promoting to production. I'm using a pair Dell r720xd servers. Each server has Windows Server Datacenter 2016, an 8-core CPU, 128GB of RAM, 4x10G NICs (two bonded for general network/heartbeat and two bonded directly between the servers for sync), Perc H710 minis, two 120GB SSDs in RAID 1 for the OS, and 5 4TB SSDs in RAID 0 for data. The issue I'm having is that when the cluster is put under any kind of load, my vSAN HA disks become desynced and cycle between syncing and not synced every ~10 minutes. On top of that, the performance of all 3 HA disks is very poor, like VMs taking seconds to register a click and the file server taking minutes to show a folder). When the system was unloaded, I could run a disk benchmark on the raw drive or the vSAN drive that got to 2200-2600MB/s sequential reads. Once loaded the raw drive is still at the same speed but the vSAN drive drops to 800MB/s (which should be fine but obviously isn't). The three HA disks I have are: one for VMs, one for a file server, and one for a remote machine, all stored on the RAID 0. The test load I'm using is an older checkpoint of my production data/VMs so I know that the system should be able to handle it. The production system is two Dell r510s each with WS DC 2016, dual 6-core CPUs, 128GB of RAM, 2x10G NICs (bonded for network and sync with 1x1G directly between nodes for heartbeat), a Dell RAID card that I can't remember anymore, 4 500GB SSDs in RAID 5, and 6 6TB HDDs in RAID 6. This cluster has the same 3 HA disk with the VMs and OS on the SSD RAID and the other two HA drives on the HDDs. This cluster is working perfectly other than a recent VSS issue with vSAN. I have logs from the new cluster available on request. Thanks in advance!