StarWind service crashed on 2 servers

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
ipodgorny
Posts: 5
Joined: Fri Nov 28, 2014 3:01 am

Sun Oct 04, 2020 12:02 am

Hello guys,
We ran Free version of Starwind for a while, and it worked great for the most part. Last week StarWind service stopped unexpectedly (from Windows application log) on both servers 2 minutes apart from each other.
Well, the storage crashed, split brain happened, we had to select a primary, etc. Took forever to mount and resync... We past that, and are working on rebuilding corrupted VMs.
My question, is where can I look for an indication of why this happened? The "Service stopped unexpectedly" isn't very descriptive. I've seen the message before, windows restarted a service, discs resynced, somewhat scary when it happens on both boxes at the same time.

Thank you
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 05, 2020 3:54 am

Greetings,

Welcome to StarWind Forum.
We need to have the logs from both servers. For quicker and easier log collection from StarWind nodes please do not hesitate to use StarWind Log Collector from our knowledge base article below: https://knowledgebase.starwindsoftware. ... collector/.
Please use Google Disk or any other similar service to transfer the log.
ipodgorny
Posts: 5
Joined: Fri Nov 28, 2014 3:01 am

Mon Oct 05, 2020 4:53 pm

Hello Yaroslav,

Here is a link. It's very kind of you to take a look, I'm also looking through windows Logs, and other ones, you probably know better what to look for.

https://republicplastics-my.sharepoint. ... g?e=ehhFcj
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Mon Oct 05, 2020 8:13 pm

No worries, I am always glad to help you. Will check the logs as soon as possible.
ipodgorny
Posts: 5
Joined: Fri Nov 28, 2014 3:01 am

Mon Oct 05, 2020 9:46 pm

yaroslav (staff) wrote:No worries, I am always glad to help you. Will check the logs as soon as possible.

Thank you so much. The crash happened around 11AM on Sep 29th.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Tue Oct 06, 2020 4:17 am

Thanks for specifying the timeframe, that was really helpful!
I can see that StarWind is installed into the VMs. Can I have the VM properties, please?
On STO112 StarWind Service was shut down unexpectedly at 11:02 and was started after restart at 11:31. On STO111, the service was started at 11:28 after the restart. It was also stopped unexpectedly at 11:02. Please make sure to stick with the regular restart procedure every time you restart the system https://knowledgebase.starwindsoftware. ... installed/. To me, it looks like some sort of hang: event was written over 20 min after it occurred.
I have also noticed that you are using an LSFS device (vSphere-0). Please consider migrating from LSFS to the regular IMG file as I am not sure if your system meets LSFS requirements https://knowledgebase.starwindsoftware. ... scription/

Also, I'd like to recommend you updating StarWind VSAN. See the procedure here https://knowledgebase.starwindsoftware. ... d-version/.

Investigating an unexpected shutdown of the service on STO112
7034 QNY-STO112.ad.republicplastics.com 70928 Error The StarWind Virtual SAN service terminated unexpectedly. It has done this 1 time(s). Service Control Manager 9/29/2020 11:02
Is not that straightforward as there is no minidump which would provide us information on what was going on with the service. I cannot investigate of StarWind Service on STO112 too as the service that might provide useful information on what was happening to the service prior to the crash was rewritten and logs there start from 9/29, at 13:54:53. Given that, I cannot tell you why that very crash occurred. On STO111, logs start from 9/29, 11:28:44, that also makes it hard to investigate what was going on with the service at 11:02.

So, what I would recommend to do is: updating StarWind and migrating from LSFS device to the IMG file (this is the proactive fix).

Let me know if you require any guidance on data migration.
ipodgorny
Posts: 5
Joined: Fri Nov 28, 2014 3:01 am

Tue Oct 06, 2020 10:34 pm

yaroslav (staff) wrote: I can see that StarWind is installed into the VMs. Can I have the VM properties, please?
2x vCPU (Host runs Intel Gold 6244), 24 GB RAM, OS Disk 40GB, 14TB RAID 5 disk mapped directly to VM via RDM

yaroslav (staff) wrote: On STO112 StarWind Service was shut down unexpectedly at 11:02 and was started after restart at 11:31.
This is exactly what we are trying to figure out. What happened there. Why would service just shut down on 2 hosts at the same time. We didn't kick off a restart or anything there.
These VMs run on 2 physically separate hosts. With RDM disks attached they can't be migrated or anything.
yaroslav (staff) wrote: I have also noticed that you are using an LSFS device (vSphere-0).
Moving away from that :)
yaroslav (staff) wrote: Also, I'd like to recommend you updating StarWind VSAN.
On it.

Investigating an unexpected shutdown of the service on STO112
yaroslav (staff) wrote: Error The StarWind Virtual SAN service terminated unexpectedly. It has done this 1 time(s). Service Control Manager 9/29/2020 11:02
The thing is that servers didn't crash or loose power, the service just quit working on both of them at the same time for no apparent reason.

I guess we are a 0.01% out of 99.99% uptime:)
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Wed Oct 07, 2020 3:59 am

Greetings,

Unfortunately, since the service log has been re-written we cannot investigate this event.
And, unfortunately, it looks like that there are no minidumps here C:\Program Files\StarWind Software\StarWind (will be really grateful if you double-check that).
2x vCPU (Host runs Intel Gold 6244), 24 GB RAM, OS Disk 40GB, 14TB RAID 5 disk mapped directly to VM via RDM
Please consider increasing the total number of vCPUs to 8 (4 vCPUs in 2 sockets) for each VM.

Please keep an eye on your setup and let us know if the issue re-occurs. Please also note that RAID5 out of spindle drives is not recommended https://knowledgebase.starwindsoftware. ... ssd-disks/.
ipodgorny
Posts: 5
Joined: Fri Nov 28, 2014 3:01 am

Thu Oct 08, 2020 5:04 pm

yaroslav (staff) wrote:And, unfortunately, it looks like that there are no minidumps (will be really grateful if you double-check that).
I checked, but didn't find anything myself.
yaroslav (staff) wrote:Please consider increasing the total number of vCPUs to 8 (4 vCPUs in 2 sockets) for each VM.
Done:)
yaroslav (staff) wrote:Please also note that RAID5 out of spindle drives is not recommended https://knowledgebase.starwindsoftware. ... ssd-disks/.
All drives are SSD, no spindles.

Thank you for looking into it though.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Fri Oct 09, 2020 7:23 am

Hey,

Thanks for the update.
Please keep an eye on your systems and let me know if any assistance is required.
Post Reply