Hello, guys.
Could you explain what's the point of connecting witness targets only through the loopback address?
And what exact behavior should we expect during fail over?
I've run a few tests and had some bad results.
Not sure if I've done everything correct, so please could you have a look?
Here's test config for 2-node Hyper-V cluster:
2 x HP DL380 Gen10 with Windows Server 2016 Datacenter
96 GB RAM on each
HPE Smart Array P816i-a SR controller
8 x 1.2 TB 12Gb SAS HDD in RAID 10
4-port 1Gb NIC teamed and connected to Hyper-V switch (1 vNIC for management and VSAN heartbeat, 1 vNIC for cluster heartbeat)
2-port 10Gb NIC (1 port for VSAN synch channel and 1 port for VSAN iSCSI)
I've created and connected targets (one for witness and 2 for CSVs ) according to your guide "
https://www.starwindsoftware.com/resour ... erver-2016". Everything seems good, VM's migration works as expected so I decided to break something a little
test 1: node 1 (is not owner of the disk witness) - stop Starwind service
result: all VMs and CSVs are online
iSCSI-targets switch to partner's storage
auto-resynch after service is started again.
test 2: node 2 (owner of the disk witness) - stop Starwind service
result: all VMs and CSVs are online, but disk witness is off
iSCSI-targets switch to partner's storage
auto-resynch after service is started again.
test 3: node 1 (is not owner of the disk witness) - shutdown all network traffic except VSAN
result: node 1 is isolated
all disks are online
all iSCSI-targets on node 2 are connected
VM from node 1 restarted on node 2
test 4: node 2 (owner of the disk witness) - shutdown all network traffic except VSAN
result: cluster is down
need to execute "Start-ClusterNode -ForceQuorum" on node 1
cluster is back online with one node
all CSVs are online, but disk witness is off
all iSCSI-targets on node 1 are connected
all VMs restarted on node 1
test 5: node 1 (is not owner of the disk witness) - shutdown all network
result: node 1 is isolated
all disks are online
all iSCSI-targets on node 2 are connected
VM from node 1 restarted on node 2
As you could see in fourth scenario there's no fail over at all. Cluster is down because quorum is lost even though all local storage connected to the host.
What do you think about it? Did I have any mistakes in the config?