Page 1 of 1

Hyper Converged Power Outage

Posted: Wed May 04, 2022 5:31 am
by janderson133
Hello,

In my lab I'm testing a hyper-converged scenario on vmware using the vsan for vmware ovf template. Here are my hosts/guests:

- vmHost1 with guest starwind-vsan1
- vmHost2 with guest starwind-vsan2
- vmHost3 with guest starwind-vsan3

I created a 2TB volume with the "master" image on starwind-vsan1 and the "partner" image on starwind-vsan2. starwind-vsan3 is the witness.

During a power outage where all hosts go down the starwind-vsan guests come up but the storage is marked as unsynchronized. I have to manually force one of the images as synced (only used master to date, but I'm guessing I could use partner if I knew the master was down before the partner). A 5 hour full synchronization is then started :/ I'm 99% sure this is by design.

I thought I read that the storage wasn't supposed to be used used during a full sync? Is this true? If yes, that really stinks because waiting 5 hours to bring up the guests is going to be a problem.

I noticed the vmHosts will remount the storage while it is being synced. However, I/O actions like storage vMotioning a guest to the iSCSI target will cause the "master" starwind-vsa1 guest to crash. Maybe this is expected behavior during a full sync?

Thanks for the help - Jeff

Re: Hyper Converged Power Outage

Posted: Wed May 04, 2022 1:56 pm
by yaroslav (staff)
Greetings,

I think master and partner are not correct as we are referring to the active-active replication. See more at https://forums.starwindsoftware.com/vie ... f=5&t=5731.
You need to mark storage as synchronized due to write back cache, most probably. See the correct procedure to fix the mutual not synchronized conditions at https://knowledgebase.starwindsoftware. ... -blackout/.
A 5 hour full synchronization is then started :/ I'm 99% sure this is by design.
Yes, full synchronization after the outage is OK.
I thought I read that the storage wasn't supposed to be used used during a full sync? Is this true? If yes, that really stinks because waiting 5 hours to bring up the guests is going to be a problem.
Only the synchronized partner is available over iSCSI during full synchronization. Not synchronized node is not available over iSCSI until in sync.
I noticed the vmHosts will remount the storage while it is being synced. However, I/O actions like storage vMotioning a guest to the iSCSI target will cause the "master" starwind-vsa1 guest to crash. Maybe this is expected behavior during a full sync?
This is probably related to the I/O load during enhanced storage vMotion. Please avoid i/o consuming tasks during the full synchronization.

Re: Hyper Converged Power Outage

Posted: Wed May 04, 2022 11:39 pm
by janderson133
Thanks for the info. I just called the images master and partner like the powershell scripts :)

Sounds like I will always have this sync issue during an outage of all nodes because of the write-back cache.

As such, I created a 3 node cluster (2 active storage nodes and 1 witness) with write-through cache and the nodes didn't synchronize after creation - I had to run the powershell scrip to mark one of the nodes as synced (I just thought that was weird because when the cluster is created with write-back cache synchronization starts automatically).

I performed that same crash test - powered off all 3 storage vms at the same time. I brought them all back online at the same time and it came up perfectly - a full sync was initiated but I didn't have to do anything.

I appreciate the help.

Jeff

Re: Hyper Converged Power Outage

Posted: Thu May 05, 2022 3:38 pm
by yaroslav (staff)
Greetings,

Yes, this synchronization behavior is often observed for HA devices with Write-Back cache on. It might happen to devices with no cache too though. See how to address such incidents in production https://knowledgebase.starwindsoftware. ... -blackout/