Cluster Storage offline, can't bring online

Initiator (iSCSI, FCoE, AoE, iSER and NVMe over Fabrics), iSCSI accelerator and RAM disk

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

yaroslav (staff)
Staff
Posts: 3171
Joined: Mon Nov 18, 2019 11:11 am

Tue Sep 17, 2024 8:22 am

Hi,

My apologies that I did not follow the original thread well. Let us try again :)
But, what we don't understand is why VSAN does not run "as intended" when we turn the whole thing on. The VM disappears, clustering fails and we have to reconnect the whole VSAN over again, sync the targets, revive the storage, etc.. Similar to the issue from the start of the thread.
Do you have a write-back cache on your devices?
Full synchronization is expected. It should start when both nodes are online again. HA will be available over iSCSI while full synchronization is running.
Also, make sure the DCs are out of the cluster.
Last but not least. CSVs have their own failover threshold. If they are reachable over iSCSI, it might be necessary to move them between the nodes to bring them online.

Please also note that Microsoft Failover Cluster is a third-party product. All StarWind VSAN does is feed the storage into it. If the storage is available (i.e., one of the partners is synchronized and can be reached over iSCSI (be it over an old session or a new one)), the issue is related to clustering.
Mareo
Posts: 18
Joined: Tue May 21, 2024 11:35 am

Tue Sep 17, 2024 10:27 am

Thanks for your update.
Do you have a write-back cache on your devices?
If you are referring to VSAN, then yes.
Also, make sure the DCs are out of the cluster.
I can confirm the DCs are out of the cluster.
Last but not least. CSVs have their own failover threshold. If they are reachable over iSCSI, it might be necessary to move them between the nodes to bring them online.
Would this be the synch part?
Please also note that Microsoft Failover Cluster is a third-party product. All StarWind VSAN does is feed the storage into it. If the storage is available (i.e., one of the partners is synchronized and can be reached over iSCSI (be it over an old session or a new one)), the issue is related to clustering.

Oh, so the whole issue would actually be related to Failover Clustering service rather than VSAN?

Would it be better to rebuild the whole cluster, to have it better configured? Because these things are going to keep happening. If i do consider that, do I need to save the CSV image or can I safely delete it and create new images and targets?
yaroslav (staff)
Staff
Posts: 3171
Joined: Mon Nov 18, 2019 11:11 am

Tue Sep 17, 2024 4:21 pm

If you are referring to VSAN, then yes.
Remove it, please. If the Write-Back cache is enabled, the chances that a manual synchronization trigger is needed are higher. See how to disable the cache here https://knowledgebase.starwindsoftware. ... -l1-cache/
Full synchronization in case of a power outage is also expected to see more
Reasons for full synchronization https://knowledgebase.starwindsoftware. ... may-start/
Failure journals https://knowledgebase.starwindsoftware. ... a-devices/
Would this be the synch part?
No, Synchronization is running in paralel. The storage has to be REACHABLE over iSCSI (i.e., connected or ready to be connected).
Oh, so the whole issue would actually be related to Failover Clustering service rather than VSAN?
If the storage is accessible, it is more of the Failover Cluster issue.
Would it be better to rebuild the whole cluster, to have it better configured? Because these things are going to keep happening. If i do consider that, do I need to save the CSV image or can I safely delete it and create new images and targets?
I do not think so. These behaviors are encoded in clusters (thresholds, need to manually start the cluster sometimes) and StarWind VSAN (full synchronization, need for manually marking devices as synchronized when Write-back cache is used, etc)
Mareo
Posts: 18
Joined: Tue May 21, 2024 11:35 am

Wed Sep 18, 2024 6:45 am

What bothers me is that I still dont have the iSCSI target I had before for the CSV and I cant seem to restore it.
yaroslav (staff)
Staff
Posts: 3171
Joined: Mon Nov 18, 2019 11:11 am

Wed Sep 18, 2024 7:09 am

Mario,

you need to mark the HA device as synchronized manually.
You can also try removing write-back cache, devices are more likely to start full synchronization on their own then.
Mareo
Posts: 18
Joined: Tue May 21, 2024 11:35 am

Wed Sep 18, 2024 7:28 am

I did mark the device sync manually, but only am able to do it for one node. If I try for second I always get "HAImage2 not found".

Also, I did remove the write-back cache as you instructed, but it doesnt seem to update itself in the Starwind console as it still states the default value for it.
EDIT: Can i provide you with some screenshots(and which ones)? Im afraid that I am not explaining the situation thoroughly.

EDIT2: Correction. I have the CSV iscsi for one node, but I do not have it for the second
yaroslav (staff)
Staff
Posts: 3171
Joined: Mon Nov 18, 2019 11:11 am

Wed Sep 18, 2024 7:42 am

Is full synchronization running?
EDIT2: Correction. I have the CSV iscsi for one node, but I do not have it for the second
The CSV should be available over iSCSI already so bring it online.
Mareo
Posts: 18
Joined: Tue May 21, 2024 11:35 am

Wed Sep 18, 2024 7:52 am

yaroslav (staff) wrote:
Wed Sep 18, 2024 7:42 am
Is full synchronization running?
EDIT2: Correction. I have the CSV iscsi for one node, but I do not have it for the second
The CSV should be available over iSCSI already so bring it online.


Pardon me for not understanding the matter completely... What do you mean by full synchronization?

CSV is available over iSCSi, but only over one target(node). I dont have it connected for the second one.
Like its listed for the srv2-witness we also had the srv2-csv

Image
yaroslav (staff)
Staff
Posts: 3171
Joined: Mon Nov 18, 2019 11:11 am

Wed Sep 18, 2024 8:00 am

Hi,

Please press the refresh button. If the CSV is there at all it should be listed as reconnecting. Please provide the screenshot of the management console.
Pardon me for not understanding the matter completely... What do you mean by full synchronization?
Full Synchronization is the process that happens when one side overwrites another (reasons https://knowledgebase.starwindsoftware. ... may-start/). Fast one happens after restarting the service/server gracefully.
Please go to the Management Console (can download here https://starwindsoftware.com/tmplink/starwind-v8.exe) and see the devices' status.
Mareo
Posts: 18
Joined: Tue May 21, 2024 11:35 am

Wed Sep 18, 2024 8:04 am

Please press the refresh button. If the CSV is there at all it should be listed as reconnecting. Please provide the screenshot of the management console.
Image

Is this ok? I can unblur some info if needed.
yaroslav (staff)
Staff
Posts: 3171
Joined: Mon Nov 18, 2019 11:11 am

Wed Sep 18, 2024 9:17 am

There is no device that can publish the storage over iSCSI: one is not synchronized, and another is missing. Please see srv02 has the image on the underlying storage.
Please also check StarWind.cfg on srv2 for any references to the headers of the missing device.

To get storage up, mark the only device as synchronized and recreate the replica. Make sure to
-check if there is an imgfile on the underlying storage. Remove it before recreating the mirror if you 100% sure that the image you plan to recreate has up-to-date data.
-the underlying storage is there for srv2.
-the headers directory under C:\Program Files\StarWind Software\StarWind on srv2 does not contain the directory or headers for the device you are about to recreate.
-remove the replication partner by running the appropriate script on srv1.
-mark node as synchronized MANUALLY as described here viewtopic.php?f=5&t=6779&p=36805&hilit= ... %3E#p36805

It looks like the partner was not created properly and another outage happened. As a result, the system could not recover itself as the conditions for storage availability were not met (i.e., the partner did not come up online).

P.s. Consider the StarWind Support plan, as the system needs configuration review and remediation by the tech. To get the quote, you can reach support@starwind.com using 1216883 and the link to this thread as your reference. Also, we will tell you more about HA concepts and handling the system.
Mareo
Posts: 18
Joined: Tue May 21, 2024 11:35 am

Wed Sep 18, 2024 11:13 am

There is no device that can publish the storage over iSCSI: one is not synchronized, and another is missing. Please see srv02 has the image on the underlying storage.
Please also check StarWind.cfg on srv2 for any references to the headers of the missing device.


YES!! This helped! Thanks! there was a line missing featuring the <device name=....> that contains the device and I guess the iscsi target for the second node. It synced.

I'll see with my superiors regarding the Starwind Support program, just so we can mitigate these issues or resolve them quicker.

Thanks again Yaroslav!
yaroslav (staff)
Staff
Posts: 3171
Joined: Mon Nov 18, 2019 11:11 am

Wed Sep 18, 2024 11:34 am

Hi,

Great news :)
I am really happy to read that it helped!
Post Reply