LSFS Defragmentation Issue

KevinR · Thu Feb 15, 2018 7:43 pm

Futher to my post in Blowfish-IT's thread regarding Starwind consuming large amounts of disk I/O related to reading (little to no write activity) LSFS log files even with no workload, I noticed that the defragmentation levels reported by the console seem quite off (>900%?) and the amount of physical space being consumed is 8-10 times that of the data being stored and has led to the system exhausting the underlying storage and crash.

Example vdisk5 LSFS HA device as seen from node A (highlighted statistics):

: IMG_0008.PNG (109.79 KiB) Viewed 13795 times

Example vdisk5 LSFS HA device as seen from node B (highlighted statistics):

: IMG_0009.PNG (88.91 KiB) Viewed 13795 times

I haven't seen this problem for a few years now, but it seems to have crept back into the latest build (11818)? I noticed that while the Starwind process is running I'm unable to delete any of the lsfs log files for a given device (they're all locked and I would expected this behavior), but if I restart the Starwind service on a node, I can then highlight all of the log files and move them, and only the ones that Starwind seems to care about remain locked and the rest are safely purged - this gets rid of 100's of GB of outdated log files. This same procedure can be repeated for different lsfs devices on different nodes all with the same result; I have to keep repeating this behavior every few weeks to keep the storage consumed under control.

Anyone else seeing this behavior?

Kevin

Mon Feb 19, 2018 9:51 am

Hi Kevin,
Can you please collect the logs from your systems and PM me for better understanding the problem you faced? Also, please specify more details about your system configuration.
You can collect log using this tool.
Thank you!

Wed Feb 21, 2018 5:07 pm

Hi Kevin,
We investigated the issue with defragmentation of HA device. It was identified and a proper fix was introduced. This fix will be available in the next StarWind build.
As for now, you can restart the StarWind service on your servers, wait for the full mounting of the devices and then do the same on another server. And do not forget to run the FlushCacheAll.ps1 script from StarWindX PowerShell examples folder before restarting the service.

KevinR · Thu Feb 22, 2018 6:02 am

That's great news that you found the problem Oleg!

When you say to run flushcacheall.ps1 before I restart the service - is that necessary even though I have only write-through caches in vsan?

Any idea when the next build will be released?

Thanks,
Kevin

Thu Feb 22, 2018 10:40 am

Hi Kevin,
I suggested you run flushcacheall.ps1 before you restart the service to speed up the process and for a proper restart.
The next build should be in the middle of the next month.

KevinR · Thu Feb 22, 2018 5:37 pm

Ok thanks for the tip and schedule update Oleg.

Kevin

Fri Feb 23, 2018 8:20 am

And thank you, Kevin!