Hyper Converged Power Outage

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
janderson133
Posts: 9
Joined: Tue Apr 26, 2022 11:41 pm

Wed May 04, 2022 5:31 am

Hello,

In my lab I'm testing a hyper-converged scenario on vmware using the vsan for vmware ovf template. Here are my hosts/guests:

- vmHost1 with guest starwind-vsan1
- vmHost2 with guest starwind-vsan2
- vmHost3 with guest starwind-vsan3

I created a 2TB volume with the "master" image on starwind-vsan1 and the "partner" image on starwind-vsan2. starwind-vsan3 is the witness.

During a power outage where all hosts go down the starwind-vsan guests come up but the storage is marked as unsynchronized. I have to manually force one of the images as synced (only used master to date, but I'm guessing I could use partner if I knew the master was down before the partner). A 5 hour full synchronization is then started :/ I'm 99% sure this is by design.

I thought I read that the storage wasn't supposed to be used used during a full sync? Is this true? If yes, that really stinks because waiting 5 hours to bring up the guests is going to be a problem.

I noticed the vmHosts will remount the storage while it is being synced. However, I/O actions like storage vMotioning a guest to the iSCSI target will cause the "master" starwind-vsa1 guest to crash. Maybe this is expected behavior during a full sync?

Thanks for the help - Jeff
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Wed May 04, 2022 1:56 pm

Greetings,

I think master and partner are not correct as we are referring to the active-active replication. See more at https://forums.starwindsoftware.com/vie ... f=5&t=5731.
You need to mark storage as synchronized due to write back cache, most probably. See the correct procedure to fix the mutual not synchronized conditions at https://knowledgebase.starwindsoftware. ... -blackout/.
A 5 hour full synchronization is then started :/ I'm 99% sure this is by design.
Yes, full synchronization after the outage is OK.
I thought I read that the storage wasn't supposed to be used used during a full sync? Is this true? If yes, that really stinks because waiting 5 hours to bring up the guests is going to be a problem.
Only the synchronized partner is available over iSCSI during full synchronization. Not synchronized node is not available over iSCSI until in sync.
I noticed the vmHosts will remount the storage while it is being synced. However, I/O actions like storage vMotioning a guest to the iSCSI target will cause the "master" starwind-vsa1 guest to crash. Maybe this is expected behavior during a full sync?
This is probably related to the I/O load during enhanced storage vMotion. Please avoid i/o consuming tasks during the full synchronization.
janderson133
Posts: 9
Joined: Tue Apr 26, 2022 11:41 pm

Wed May 04, 2022 11:39 pm

Thanks for the info. I just called the images master and partner like the powershell scripts :)

Sounds like I will always have this sync issue during an outage of all nodes because of the write-back cache.

As such, I created a 3 node cluster (2 active storage nodes and 1 witness) with write-through cache and the nodes didn't synchronize after creation - I had to run the powershell scrip to mark one of the nodes as synced (I just thought that was weird because when the cluster is created with write-back cache synchronization starts automatically).

I performed that same crash test - powered off all 3 storage vms at the same time. I brought them all back online at the same time and it came up perfectly - a full sync was initiated but I didn't have to do anything.

I appreciate the help.

Jeff
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Thu May 05, 2022 3:38 pm

Greetings,

Yes, this synchronization behavior is often observed for HA devices with Write-Back cache on. It might happen to devices with no cache too though. See how to address such incidents in production https://knowledgebase.starwindsoftware. ... -blackout/
Post Reply