Witness failed

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Anatoly (staff), Max (staff)

Post Reply
Davis
Posts: 24
Joined: Tue Jan 23, 2018 10:12 am

Thu Jan 25, 2018 4:47 pm

Hello,

During my lab tests of 2-node hyper convergent cluster I've found one unacceptable thing.
I have set up three HA SW devices: 1 witness and two storage disks. First storage device has same parameters as witness device(w/o cache).
Witness device is connected only locally, using one iSCSI path, others are connected using two paths.
Cluster has 3 NICs, NIC1 used for client+cluster+heartbeat, NIC2 - iSCSI+SW Sync, NIC3 - SW Sync.
Witness device was set up as cluster witness disk.

Powering off of one node does not create any problem for cluster.
But then I've started to test local storage failure and stopped SW service on first node.
Witness disk on first node have stopped - it's OK.
But on second node witness disk also stopped!
Both nodes lost witness and cluster were shut down!

I've made several repeats with same result.
Then I've changed cluster quorum settings to witness file share and shut down SW service again.
Cluster were alive but this witness device became offline again. So it's behaviour does not relate to witness role.
I've attached screenshot of this moment.
As you can see iSCSI is connected(via 127.0.0.1) and this disk is shown in Disk Management but it failed in Cluster Manager.

For me it looks like we cannot use SW witness device. Am I wrong?
SW v 8.0.0.11818
Attachments
WitnessDown.JPG
WitnessDown.JPG (215.85 KiB) Viewed 5365 times
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Jan 25, 2018 7:37 pm

Please submit a support ticket through the website for detailed investigation of this issue.
Davis
Posts: 24
Joined: Tue Jan 23, 2018 10:12 am

Fri Jan 26, 2018 9:03 am

New info.
Each cluster disk has owner node.
Let's witness(here I mean disk with only local iSCSI path) disk owned by node 1.
After I stop StarWindService on this node disk becomes offline on both nodes.
I start service and it becomes online shortly.
Now I stop StarWindService on second node and ... it does not leads to any disk becomes offline.
All disks are online.
Seems like in a case when nodes can communicate each other failure of disk on one node forces disk to be offline on all nodes even if disk is OK for second node.
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Fri Jan 26, 2018 9:55 am

We will deal with this, too.
Post Reply