I made 2 LUNs (2xtarget, 2ximages), initiated them with mpio and set them up as MSCS quorum witness (1 GB) and Hyper-V VM role storage (128 GB) . 2 servers = 2 nodes. 1 NIC for everything per server (lab test) on a single switch. I simulate failovers by simply pulling individual NIC's creating a full disconnect. Failover works, but I've had mixed results with WB (massively corrupted VM). Would like to know the self-repair expectations before trying another round of tests tonight with WT disks. I'm not sure the problems during testing have been entirely WB related, but I'm hoping so.
1. Should I expect self-repair after a node comes back online and they are both online?
Because I'm doing failover tests and the VM gets more and more corrupted until it doesn't even start anymore. This was with WB, gonna try WT tonight. I'm just wondering if I need to do something actively with specific steps when a node comes back online or if I can expect the services to fix it automagically as long as both nodes are alive and can talk together. They should know who has latest data and keep data consistent, right? The most silly thing I had was a node going offline and the LUN went into failed state, making cluster pointless and fragile.
Redoing all tests tonight with WT and would like to know what to expect from self-repair in the free version so I can simulate failovers with the correct expectations in mind. I'm hoping to rely on services to take care of its own health, as long as they can speak together.
I need the cluster to be able to fail over without corruption or any manual intervention if a node goes offline - any node. And I'd like the cluster to entirely fix itself when the node comes back online. Effectively preparing it for another failover if needed. I can't have a node set itself as failed just because one disappears, that's the whole point of HA.
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software