Thread 161 caused unhandled exception - StarWind Crash

nicholas.dale · Sun Aug 01, 2021 1:38 am

After a stable few weeks, I put the StarWind cluster into maintenance mode to restart the VM hosts. After booting everything back up, everything seemed normal for a few hours, however then I started seeing the StarWind VSAN service crashing on both of the HA Nodes, causing the storage to become unavailable. After restarting the nodes, and manually resynchronising them, everything works for a few more hours before the service crashes again.

Looking at the logs, errors such as the following appeared:

Code: Select all

_miniDumpFilter: Thread 161 caused unhandled exception

followed by the service crashing and a crashdump appearing in the logs.

I've attached links to download the StarWind log files for the two hosts below.
https://drive.google.com/file/d/1woFBhm ... sp=sharing
https://drive.google.com/file/d/1KztwN6 ... sp=sharing

I have previously engaged for the forum in this post: https://forums.starwindsoftware.com/vie ... f=5&t=5835 where I tried to implement the recommendations.

Sun Aug 01, 2021 2:44 am

Hello Nicholas,
Please consider updating StarWind VSAN which is described here: https://knowledgebase.starwindsoftware. ... d-version/

nicholas.dale · Sun Aug 01, 2021 3:28 am

Hi there,
I have run the starwind_update.run script after downloading the update file and both nodes have reported "No need to update".

(Update: Just checked, there are yum updates available that I could manually install)

Sun Aug 01, 2021 7:07 am

Hi,
If you are already running the 14120 build, no update is really needed.
Did you do the update earlier? If so, there should be files like *.gz, *.gz.1, *.gz.2, etc. Do ls -a in the directory where update was downloaded, unpack the latest one, and run the update script.

nicholas.dale · Sun Aug 01, 2021 10:25 am

I think I made a mistake running the update previously, I forgot to unzip the downloaded files before running the updater. I have run the update on both nodes, restarted and will now try a resynchronisation and let you know how it goes. Thanks.

Sun Aug 01, 2021 11:28 am

Thank you for your update. Please keep us posted.

nicholas.dale · Mon Aug 02, 2021 12:08 pm

Hi there,
I've been seeing some messages like the following in my VSphere logs.

Code: Select all

iSCSI discovery to 10.1.2.1 on vmhba66 failed. The iSCSI Initiator could not establish a network connection to the discovery address.

Lost uplink redundancy on virtual switch vSwitch1. Physical NIC vmnic1 is down. Affected portgroups:iSCSI-Heartbeat_VMKernel.

iSCSI discovery to 10.1.2.2 on vmhba66 failed. The iSCSI Initiator could not establish a network connection to the discovery address.

However, I did notice in ESXi logs a message like

Code: Select all

 Lost network connectivity on virtual switch vSwitch1. Physical NIC vmnic2 is down. Affected portgroups:iSCSI-Heartbeat_VMKernel

, so it could be possible I have a hardware issue also. Could there be any configuration issue with StarWind VSAN which exacerbates this potential network issue?

Mon Aug 02, 2021 12:50 pm

Yes, these alerts point to issues with physical NIC. Please check the cable and make sure to set MTU to 1500 for the entire network stack: both inside VM and for ESXi hosts and switches (if there are any).