Starwind Vsan on windows installation problems & solutions
Posted: Thu Nov 09, 2017 7:41 pm
I started using this software about a month ago, and had nothing but problems initially, but with some help from ivan and ALOT of troubleshooting and deleting / recreating volumes, I think I have it running pretty well. Copying data back to the Vsan clusters from USB drivers I hot 350MB/s+ and 6Gbit across my infiniband. These are things that worked for me, but your mileage may vary.
Issue#1 Slow drive performance.
Solution(s)
#1. I was having weird performance on one of the servers, I decided to break the array and test each individual drive, I found that one of my brand new 4TB WD RED drives was writing at 35MB/s.
#2. I had an LSI 9260 in one server, and an LSI 9271 in the other. The 9260 was seeing around 1/2 the performance of the other one. I upgraded to a 9265 and the speeds are more comparable
#3. Make sure RAID controllers have write-back cache enabled with a BBU.
#4. Raid 10, just do it, don't fall for the speeds of RAID-5
Issue #2 Network Performance.
#1. My infiniband had an MTU issue, on windows it needs to be 2048. Don't try to set it 4096, seriously, don't change it to 4096.
#2. I also had a bad infiniband card, which was causing slow write, slow transfer & random disconnects.
#3. I had 2 nics in one server that had IPs on the same subnet, had to change to different subnet to allow for heartbeat / replication traffic.
#4. There are some NIC settings on the iSCSI nics to help performance. Make sure to run them. I used a script put out by a competitor that disabled NAGLE and unbinded ipv6, etc. That will speed things up 10-25%.
Issue #3 Slow initial replication
Solution:
Create drives as 1GB, then replicate, then increase. Time consuming, but just do it.
Issue #3 Slow copy speed to synced drives.
Solutions(s)
#1. TOO MUCH CACHE. RAID controllers have a 1GB cache, I had a 1GB cache on the starwind side initially, plus I had enabled a 20Gb cache in storage spaces. If you have RAID controller cache, don't use starwind cache. If you are using storage spaces, don't get fancy and add a ton of SSD caching. The cache has to flush at some point, that's why your transfer speeds look like a roller coaster. Once I turned off all the starwind cache and left the storage spaces cache at 1GB, the speeds were 300MB/S or higher sustained.
#2. Try disabling ODX in the starwind config, that had me limited to 100GB/S. (Make sure to stop starwind vsan service when editing this file)
#3. Make sure to change <iScsiDiscoveryListInterfaces value="1"/> in the starwind.cfg file (make sure the stop the starwind vsan service when editing this file)
#4. Make sure your CPU isn't maxing out, this amount of traffic can kill your CPU if it's underpowered.
Issue#1 Slow drive performance.
Solution(s)
#1. I was having weird performance on one of the servers, I decided to break the array and test each individual drive, I found that one of my brand new 4TB WD RED drives was writing at 35MB/s.
#2. I had an LSI 9260 in one server, and an LSI 9271 in the other. The 9260 was seeing around 1/2 the performance of the other one. I upgraded to a 9265 and the speeds are more comparable
#3. Make sure RAID controllers have write-back cache enabled with a BBU.
#4. Raid 10, just do it, don't fall for the speeds of RAID-5
Issue #2 Network Performance.
#1. My infiniband had an MTU issue, on windows it needs to be 2048. Don't try to set it 4096, seriously, don't change it to 4096.
#2. I also had a bad infiniband card, which was causing slow write, slow transfer & random disconnects.
#3. I had 2 nics in one server that had IPs on the same subnet, had to change to different subnet to allow for heartbeat / replication traffic.
#4. There are some NIC settings on the iSCSI nics to help performance. Make sure to run them. I used a script put out by a competitor that disabled NAGLE and unbinded ipv6, etc. That will speed things up 10-25%.
Issue #3 Slow initial replication
Solution:
Create drives as 1GB, then replicate, then increase. Time consuming, but just do it.
Issue #3 Slow copy speed to synced drives.
Solutions(s)
#1. TOO MUCH CACHE. RAID controllers have a 1GB cache, I had a 1GB cache on the starwind side initially, plus I had enabled a 20Gb cache in storage spaces. If you have RAID controller cache, don't use starwind cache. If you are using storage spaces, don't get fancy and add a ton of SSD caching. The cache has to flush at some point, that's why your transfer speeds look like a roller coaster. Once I turned off all the starwind cache and left the storage spaces cache at 1GB, the speeds were 300MB/S or higher sustained.
#2. Try disabling ODX in the starwind config, that had me limited to 100GB/S. (Make sure to stop starwind vsan service when editing this file)
#3. Make sure to change <iScsiDiscoveryListInterfaces value="1"/> in the starwind.cfg file (make sure the stop the starwind vsan service when editing this file)
#4. Make sure your CPU isn't maxing out, this amount of traffic can kill your CPU if it's underpowered.