VSAN Free Crashed

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

DWHITTRED
Posts: 17
Joined: Sun Dec 01, 2019 11:38 pm

Mon May 04, 2020 12:29 am

Hi,
I have been evaluating the Virtual SAN Free and last night it unexpectedly crashed and recorded this in the log:

Code: Select all

5/4 3:36:32.777196 634 error: Sp: *** CStarPackCoreNew::CStarPackCore::FlushingThread (8807) (0x000002668C020000) pNextContext is NULL
5/4 3:36:32.777326 634 error: Sp: *** CStarPackCoreNew::CStarPackCore::FlushingThread (8772) (0x000002668C020000) pNextContext is NULL
5/4 3:36:32.777338 634 error: Sp: *** CStarPackCoreNew::CStarPackCore::InternalFlushEx (8532) (0x000002668C020000) pContext is NULL
5/4 3:36:32.824361 634 debug: *** _miniDumpFilter: The program encountered a serious error and may be closed. Crash dump will be created.
Please, save the log file and the crash dump and report the problem to support@starwindsoftware.com
5/4 3:47:44.989310 634 debug: *** _miniDumpFilter: Minidump 'C:\Program Files\StarWind Software\StarWind\starwind.20200504.033632829.mdmp' created successfully.
5/4 3:47:45.086960 634 error: Sp: xxx CStarPackCoreNew::CStarPackManager::Stop (772) Destroying CStarPackThread object at 0x00000269112FB450
5/4 3:47:45.087033 634 error: Sp: xxx CStarPackCoreNew::CStarPackManager::Stop (772) Destroying CStarPackThread object at 0x00000269112FB870
5/4 3:47:45.087054 634 error: Sp: xxx CStarPackCoreNew::CStarPackManager::Stop (772) Destroying CStarPackThread object at 0x00000269112FB5F0
5/4 3:47:45.087070 634 error: Sp: xxx CStarPackCoreNew::CStarPackManager::Stop (772) Destroying CStarPackThread object at 0x00000269112FB610
Could you help me understand why this happened? This also happened about two months ago.

I have use the Starwind log collector before restarting the service, so I should have copies of any information you may need.
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Mon May 04, 2020 10:38 am

Hi DWHITTRED,

I am afraid that a tiny portion of StarWind Service log is not enough; dump file contains the key.
Could you please share full logs with me? Please use StarWind Log Collector for that purpose https://knowledgebase.starwindsoftware. ... collector/.
Also, would be happy to see the dump file.
Use Google Drive to share the logs (they might be too large to be posted at the forum).
DWHITTRED
Posts: 17
Joined: Sun Dec 01, 2019 11:38 pm

Sun May 17, 2020 1:42 am

Hi,

Apologies for not getting back to you - I kinda expected a notification through my emails and forgot to check the forum (sorry!)

Below is a link to my logs from the StarWind Log Collector:

https://drive.google.com/file/d/1KRZe91 ... sp=sharing

I haven't uploaded the dump file yet - its 6.3GB and will take a long time to upload. Can you confirm if you need a copy of the dump file before I start that upload process?

Kind regards,
Daniel
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Sun May 17, 2020 9:23 am

Greetings,

The logs you provided me with have only StarWind VSAN service logs. StarWind Log Collector collects system logs, application logs, info on StarWind HA devices, and networking connection. It is way more informative than StarWind VSAN logs alone.
Yes, we need minidump because if service crashes all the useful info is in the minidump.
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Wed May 20, 2020 12:04 pm

Hi,

Done with log investigation.
Let me clarify what has caused the issue.
5/4 3:36:32.777196 634 error: Sp: *** CStarPackCoreNew::CStarPackCore::FlushingThread (8807) (0x000002668C020000) pNextContext is NULL
5/4 3:36:32.777326 634 error: Sp: *** CStarPackCoreNew::CStarPackCore::FlushingThread (8772) (0x000002668C020000) pNextContext is NULL

Events indicate that there are not enough CPUs assigned. Try using 8 CPUs (4 vCPUs in 2 sockets). Also, consider using at least 8GB of RAM

I have also noticed that you used LSFS. Please, note that we do not recommend LSFS for any production use due to the intense growth of LSFS devices. LSFS container description is here https://knowledgebase.starwindsoftware. ... scription/.
Please consider updating to the latest build. Download it at https://www.starwindsoftware.com/tmplin ... ind-v8.exe the update procedure can be found at https://knowledgebase.starwindsoftware. ... d-version/.

Please uninstall the following components: StarWind VSS Provider, StarWind Cluster Service, and StarWind SMI-S Agent. They are not needed.
Uninstall StarWind SoftwareVSS Provider
cd "C:\Program Files\StarWind Software\StarWind\VSS"
stop_.bat

Uninstall SMI-S Agent – run commnads below from CMD:
cd "C:\Program Files\StarWind Software\StarWind\OpenPegasus\bin\"
ConfiguratorConsole.exe" --stop --name StarWindSMISAgent
cd"C:\Program Files\StarWind Software\StarWind\OpenPegasus\bin\"
ConfiguratorConsole.exe" --uninstall --name StarWindSMISAgent

Uninstall StarWind Cluster service – run commands below from CMD:
cd C:\Windows\Microsoft.NET\Framework\v4.0.30319
installutil.exe /u "C:\Program Files\StarWind Software\StarWind\StarWindCluster\StarWind.ClusterService.exe"

I have escalated this issue to R&D team. Will let you know if I learn anything interesting from them.
DWHITTRED
Posts: 17
Joined: Sun Dec 01, 2019 11:38 pm

Sun May 24, 2020 11:07 am

Hi,

Thankyou for the information. I have additional questions.
yaroslav (staff) wrote: Events indicate that there are not enough CPUs assigned. Try using 8 CPUs (4 vCPUs in 2 sockets). Also, consider using at least 8GB of RAM
This is a physical machine, so I cannot provision additional vCPUs. This physical machine is only used as a storage device and doesn't run other services. It also has 32GB of RAM and I have sized my LSFS device based on your RAM requirements for inline de-duplication. When looking at the historical performance data the machine does not seem to see heavy CPU utilisation even when I am maxing out my iSCSI network speed - for example in the last 24 hours the CPU did not exceed 58% (from the Starwind Management Console performance graph). Could you recommend some performance benchmarks that I need to meet for this to not fail? for example, a minimum CPU synthetic PassMark score from [ ... ]

yaroslav (staff) wrote: I have also noticed that you used LSFS. Please, note that we do not recommend LSFS for any production use due to the intense growth of LSFS devices.
This is in contradiction to Starwind's published information on LSFS. In your existing LSFS documentation (going back to 2014) you state that it is the ideal storage device for virtualised workloads. How can this be if you do not recommend its use? Is your published information incorrect? Can you please confirm this statement because I find it very concerning to be told that this is not recommended after I have followed your advertising, whitepapers, and documentation.

yaroslav (staff) wrote: Please consider updating to the latest build.
After this failure I upgraded to the latest build as of May 4th as part of my troubleshooting. I will update to the latest version as of May 6th.

yaroslav (staff) wrote: Please uninstall the following components: StarWind VSS Provider, StarWind Cluster Service, and StarWind SMI-S Agent. They are not needed.
Thankyou for pointing this out, I will remove these components as requested.
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Mon May 25, 2020 3:02 pm

Could you recommend some performance benchmarks that I need to meet for this to not fail? for example, a minimum CPU synthetic PassMark score from [ ... ].
You can use any convenient tool that is recommended by community.
How can this be if you do not recommend its use?
Do not get me wrong. You can use it for production, however, please make sure that the environment meets the recommendations of StarWind mentioned at https://knowledgebase.starwindsoftware. ... scription/. Please be aware of storage consumption: LSFS files can occupy 3 times more space compared to initial LSFS size. Snapshots require additional space to store them.

Please let us know if there is anything else I can assist you with.
DWHITTRED
Posts: 17
Joined: Sun Dec 01, 2019 11:38 pm

Wed May 27, 2020 12:50 am

Hi,
yaroslav (staff) wrote:You can use any convenient tool that is recommended by community.
I am not following. I asked what level of performance I need to meet.
yaroslav (staff) wrote:Events indicate that there are not enough CPUs assigned.
What you have said here is that I need more CPU - what I am trying to find out is how much more CPU? What is the minimum clockspeed, or core count, or performance benchmark that I need to meet?
yaroslav (staff) wrote:Do not get me wrong. You can use it for production
Thank you for the confirmation, however in your previous post you literally said not to use it in production:
yaroslav (staff) wrote:Please, note that we do not recommend LSFS for any production use
You understand why I am confused?

Regards,
Daniel
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Wed May 27, 2020 3:17 pm

Daniel,

I can fully understand why you are confused. Yes, you can use it for production. Sorry to mislead you.
what I am trying to find out is how much more CPU?
I thought that you are running StarWind VSAN inside a VM. That is why I advised you to add CPU.
Here is the requirements for LSFS https://knowledgebase.starwindsoftware. ... scription/. You have already seen this document here and there is nothing about CPU requirements. The server should have at least 8 CPU cores, 2 GHz, and 8 GB of RAM. These are the requirements we recommend for VMs, but I think that your server is much better than that. Frankly speaking, the error you faced is quite interesting and I redirected the logs to R&D to study them. We had this issue reported only twice however it was not reproduced before.
DWHITTRED
Posts: 17
Joined: Sun Dec 01, 2019 11:38 pm

Wed Jun 03, 2020 12:34 am

Thank you for your reply. I really do appreciate the assistance.

Since I have opened this ticket I have had VSAN crash two more times. One of them happened last night. I am happy to keep providing information to help your R&D if needed. I have both minidump files from the events.

One happened on build 13569 on 27th May and the one again on 3rd June on build 13586. The latest one has occurred after I have followed your previous advice and removed the unused Starwind components.
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Wed Jun 03, 2020 2:42 pm

DWHITTRED,

Could you upload everything (i.e., logs and minidump) the same way you did it before?
Thank you in advance!
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jun 11, 2020 11:19 am

yaroslav (staff) wrote:
Could you recommend some performance benchmarks that I need to meet for this to not fail? for example, a minimum CPU synthetic PassMark score from [ ... ] .
You can use any convenient tool that is recommended by community.
How can this be if you do not recommend its use?
Do not get me wrong. You can use it for production, however, please make sure that the environment meets the recommendations of StarWind mentioned at https://knowledgebase.starwindsoftware. ... scription/. Please be aware of storage consumption: LSFS files can occupy 3 times more space compared to initial LSFS size. Snapshots require additional space to store them.

Please let us know if there is anything else I can assist you with.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jun 11, 2020 11:22 am

yaroslav (staff) wrote:
Could you recommend some performance benchmarks that I need to meet for this to not fail? for example, a minimum CPU synthetic PassMark score from [ ... ].
You can use any convenient tool that is recommended by community.
How can this be if you do not recommend its use?
Do not get me wrong. You can use it for production, however, please make sure that the environment meets the recommendations of StarWind mentioned at https://knowledgebase.starwindsoftware. ... scription/. Please be aware of storage consumption: LSFS files can occupy 3 times more space compared to initial LSFS size. Snapshots require additional space to store them.

Please let us know if there is anything else I can assist you with.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
DWHITTRED
Posts: 17
Joined: Sun Dec 01, 2019 11:38 pm

Fri Jun 12, 2020 3:45 pm

Hi,
Just following up with this and wondering if there is anything else I can try?
This last crash was a bit devastating as it ended up corrupting the filesystem within the LSFS device. I was able to restore my VM's from backups, but the experience left me a bit unsure of whether to restore back to the LSFS device..
Regards,
Daniel
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Mon Jun 15, 2020 3:49 am

Greetings Daniel,

Sorry to read that.
Could you share the latest minidump and logs with us, please?
Post Reply