StarWind vSAN Free not syncing

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
AraCom
Posts: 7
Joined: Mon Dec 18, 2017 9:33 am

Mon Dec 02, 2019 8:14 pm

Hello to all from the StarWind Forum :)

I'm pretty new here, so i'll try my best to describe my issue with our vSAN.

We currently operate two Windows servers with the installed vSAN Free Software with ESXi.
Both Windows servers have a 10Gbit network card for the sync and a second network card for the heartbeat.
Until recently it worked flawlessly, but since a few days one server is no longer synchronized.
Today I installed the current update on both servers, what unfortunately didn't help.
With the PowerShell scripts I already tried to start the synchronization, but unfortunately this was unsuccessful so far.

Are there any ideas how to force synchronization with vSAN Free?

These are the PowerShell Errors for both HAImages:

Code: Select all

HAImage1
Device not synchronized. Synchronize current node from partner 'iqn.2008-08.com.starwindsoftware:10.10.10.22-sas-sw-01'
Request to  ARAVSAN01.ARACOM.LOCAL ( 127.0.0.1 ) : 3261
-
control 0x00000183AB0500C0 -RestorePartnerNode:"iqn.2008-08.com.starwindsoftware:10.10.10.22-sas-sw-01"
-
200 Failed: connection with partner node is invalid.. 
Thanks in advance and kind regards,

Jonas
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Tue Dec 03, 2019 12:28 pm

Is at least one of the nodes synchronized? What is the build that you had this issue appearing with?
AraCom
Posts: 7
Joined: Mon Dec 18, 2017 9:33 am

Wed Dec 04, 2019 12:18 pm

Yes one of both Serves is synchronized.

It happened on build 12585, i istalled 13279 on mondey and hoped, that it starts syncing normally.

But this was unfortunately not the case.

Best Regards
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Wed Dec 04, 2019 3:47 pm

Can you confirm the network connectivity is not broken? What scripts and on which nodes have you tried to make the sync process happen?
AraCom
Posts: 7
Joined: Mon Dec 18, 2017 9:33 am

Thu Dec 05, 2019 6:23 am

I'm not 100% sure if the network is fine. How can i test it?

Is it the 10gig Connection or the Heartbeat Network, which is giving us the errors?

Last night we also lost our second vSAN Server, so we didn't have any iSCSI Connections what so ever. After i restarted the Service and set the HAImages manually so synchronized, it worked again.

I tried the script "SyncHaDevicesAdvanced" on the node which is not synchronized.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Dec 05, 2019 1:02 pm

I'm not 100% sure if the network is fine. How can i test it?
Simply by using ping. That would give you an idea whether networking is fine.
AraCom
Posts: 7
Joined: Mon Dec 18, 2017 9:33 am

Thu Dec 05, 2019 2:22 pm

Ah allright, I get it. Pings always run without problems.

Maybe i should mention that the Heartbeat Network Card ist the same as the one for our internal network like RDP and so on.

One of the two vSAN servers was switched off for almost 24 hours. Meanwhile ESXi lost the iSCSI connections to the second host again.
After I started the first vSAN server this morning, the synchronization started immediately.
The full synchronization will hopefully run until the end.
The following error messages can be found in the event logs of the first vSAN server.
Could this be another problem? Our configuration is the following:
  • Both ESXi hosts have 3 SAS Disks in Raid 5 (3x4TB) and 5 SSD Disks in Raid 5 (5x400GB)
  • Almost the full space is passed through to the vSAN Windows Machine
  • In windows the .img, .swdsk files are placed directly on these Disks
  • Both Windows Drives are NTFS
Best Regards
Attachments
Eventlog_vSAN01-3.png
Eventlog_vSAN01-3.png (7.21 KiB) Viewed 8554 times
Eventlog_vSAN01-2.png
Eventlog_vSAN01-2.png (6.96 KiB) Viewed 8554 times
Eventlog_vSAN01-1.png
Eventlog_vSAN01-1.png (8.6 KiB) Viewed 8554 times
AraCom
Posts: 7
Joined: Mon Dec 18, 2017 9:33 am

Fri Dec 06, 2019 7:23 am

So we got the same problem today again...

Everything was syncing and at about 40% the sync aborted today at about 07:30 AM.
If that's not all, the iSCSI connection to the ESXi servers broke again during operation and all servers went down again.

I have no clue what is going wrong in our infrastructure. We didn't change a thing and now it's constantly not working very well.

Any ideas?

Thanks a lot and best regards!
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Dec 06, 2019 10:39 am

Can you check for error 129 / 153 in Windows System logs? Are they present?
AraCom
Posts: 7
Joined: Mon Dec 18, 2017 9:33 am

Fri Dec 06, 2019 10:42 am

Yes under Windows -> System there are a frew 129 (Time-Service) and 153 (Kernel-Bott) Warnings and Informations.

But no Errors from StarWind.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Dec 06, 2019 1:15 pm

Are there 129 / 153 errors from storage controller? Sorry, I did not specify the event source at first.
AraCom
Posts: 7
Joined: Mon Dec 18, 2017 9:33 am

Mon Dec 09, 2019 7:16 am

Hi, Boris,

Thank you so much for your feedback. Unfortunately I couldn't find the error codes in our system.

Well, I have now sat down again on the weekend to the infrastructure and the configuration fundamentally revised.
ESXi:
- vSwitche configured
- Failover of 10Gig Fibre configured
vSAN:
- Two new vSAN servers installed
- Existing hard disks converted to eagerzero and then imported to the new vSAN Server
- Total of 4 network cards added for management, heartbeat, sync and iSCSI traffic

Everything is going wonderfully well at the moment. Furthermore, the throughput on the sync interface is more than twice as high.
I think I was following the wrong instructions during the implementation. Many settings I made over the weekend were definitely not active before!

Thanks anyway! :)
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Dec 09, 2019 10:39 am

Great to know you figured it out on your own by redeploying the setup. Be sure to have a look at our Resource Library https://www.starwindsoftware.com/resource-library/ to find the relevant documentation on different configuration scenarios whenever you are about to deploy your setup.
Post Reply