Hello, I have an interesting situation where we recently recovered from a 2 node cluster failure. What had happened is the two identical servers hosting a VSAN had a fault on 1 DIMM each that ended up destroying the motherboards. One server was completely not operational, but the other was operational under 1 CPU. That server also had a PCIE SSD for cache. The PCIE riser did not work so we didn't have access to that cache disk.
As a result, I was not able to bring the VSAN online because it could not find one of the devices. I found the issue and modified a copy of the swdsk file to remove that SSD. Recovering to the HA device did not work, but attaching it as a FLAT device did work. So I was back in business.
Well now the hardware has been replaced in both nodes and I have access to the SSD again. Now the issue I have is that this one server that was running has been running some VMs while the other was down and has new data.
1. I'm afraid to connect the other server back to the network and lose data during a sync.
2. I am not sure to fix the High Availability since when I tried to reuse my saved swdsk files the device would not be active
3. I'm still running production VMs at the moment and I need to minimize downtime
Anyone have some suggestions or is it just create some full backups and hope for the best?
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software