Lost Connection to Witness when Synchronizing

Software-based VM-centric and flash-friendly VM storage + free version
Post Reply
muhlitfan
Posts: 11
Joined: Thu Oct 09, 2025 6:08 am

Tue Nov 25, 2025 7:37 am

Hello, I have some questions regarding my implementation of VSAN Starwind Free using SMB Witness. The source of synchronization's node is having this log. Looks like I have some timeout connection to my SMB Witness from the HAImage node. But the synchronization still occurs between two nodes.
11/25 12:18:54.232552 15a0 HA: *** SmbWitnessImp::SmbClient::createFile: SmbWitnessImp::SmbClient::createFile(\\DOMAIN01POJ\FSWStarwind\witness.dat, 0x00008002, 0x00000020, 0x00000180)
11/25 12:18:55.290001 15a0 HA: *** SmbWitnessImp::SmbClient::createFile: SmbWitnessImp::SmbClient::createFile(\\DOMAIN01POJ\FSWStarwind\witness.dat, 0x00008002, 0x00000020, 0x00000180)
11/25 12:18:56.373109 15a0 HA: *** SmbWitnessImp::SmbClient::createFile: SmbWitnessImp::SmbClient::createFile(\\DOMAIN01POJ\FSWStarwind\witness.dat, 0x00008002, 0x00000020, 0x00000180)
11/25 12:19:00.370984 7b78 HA: *** SmbWitnessImp::waitForJob: open timeout(4000 ms) occured!
11/25 12:19:00.371093 7b78 HA: *** Witness::open: WitnessImp open failed, timeout(4000 ms) occurred!
11/25 12:19:00.371122 7b78 HA: *** Witness::acquireSimple: Simple acquisition failed, can't open witness!
11/25 12:19:00.376136 7b78 HA: HANode::simpleCheckQuorumNodeMojority: Information about available connections with partners: partnersCount = 1, availablePartnersCount = 1
11/25 12:19:00.376188 7b78 HA: HANode::simpleCheckQuorumNodeMojority: Total(witness is excluded): participantsCount = 2, possibleVotes = 2
11/25 12:19:00.376215 7b78 HA: *** HANode::witnessAcquisitionLost: We have lost witness acquisition after keeping, but we have quorum! It's very rare case, check other conditions!
11/25 12:19:12.840008 2c94 HA: *** SmbWitnessImp::waitForJob: available timeout(4000 ms) occured!
11/25 12:19:46.837193 2c94 HA: *** SmbWitnessImp::waitForJob: available timeout(4000 ms) occured!
11/25 12:20:20.864590 2c94 HA: *** SmbWitnessImp::waitForJob: available timeout(4000 ms) occured!
11/25 12:20:54.861933 2c94 HA: *** SmbWitnessImp::waitForJob: available timeout(4000 ms) occured!
11/25 12:21:07.632214 15a0 HA: *** SmbWitnessImp::synchronousOpen: ReadFile(\\DOMAIN01POJ\FSWStarwind\witness.dat) failed, error code 0x2!
11/25 12:21:58.916362 2c94 HA: *** SmbWitnessImp::waitForJob: available timeout(4000 ms) occured!
11/25 12:23:02.925840 2c94 HA: *** SmbWitnessImp::waitForJob: available timeout(4000 ms) occured!
11/25 12:24:06.936911 2c94 HA: *** SmbWitnessImp::waitForJob: available timeout(4000 ms) occured!
11/25 12:25:11.113085 2c94 HA: *** SmbWitnessImp::waitForJob: available timeout(4000 ms) occured!
11/25 12:26:15.139291 2c94 HA: *** SmbWitnessImp::waitForJob: available timeout(4000 ms) occured!
1. What are the consequences if my node(s) lose connection to the Witness when it's synchronizing?
2. Will the failover fail after the synchronization process is done and one of the node is down?
3. Is there any way that I can increase the timeout or the number of retry of the connection to my Witness?

Thank you
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Tue Nov 25, 2025 8:16 am

Glad to read from you again!
1. What are the consequences if my node(s) lose connection to the Witness when it's synchronizing?
Losing connection to the witness alone is not a big deal, if each node can't communicate to the remaining partner, i.e., stays isolated, the shared storage will stop working.
2. Will the failover fail after the synchronization process is done and one of the node is down?
I don't quite follow it. StarWind VSAN offers active-active replicated storage. Until the link between the data nodes is up, both of them publish storage and therefore are synchronized. If one of them goes down, fast synchronization happens.
3. Is there any way that I can increase the timeout or the number of retry of the connection to my Witness?
AFAIK, not for SMB witness.
muhlitfan
Posts: 11
Joined: Thu Oct 09, 2025 6:08 am

Tue Nov 25, 2025 8:37 am

yaroslav (staff) wrote:
Tue Nov 25, 2025 8:16 am
I don't quite follow it. StarWind VSAN offers active-active replicated storage. Until the link between the data nodes is up, both of them publish storage and therefore are synchronized. If one of them goes down, fast synchronization happens.
11/25 12:19:00.371122 7b78 HA: *** Witness::acquireSimple: Simple acquisition failed, can't open witness!
11/25 12:19:00.376136 7b78 HA: HANode::simpleCheckQuorumNodeMojority: Information about available connections with partners: partnersCount = 1, availablePartnersCount = 1
11/25 12:19:00.376188 7b78 HA: HANode::simpleCheckQuorumNodeMojority: Total(witness is excluded): participantsCount = 2, possibleVotes = 2
11/25 12:19:00.376215 7b78 HA: *** HANode::witnessAcquisitionLost: We have lost witness acquisition after keeping, but we have quorum! It's very rare case, check other conditions!
Sorry what I meant is, from what I understand from the logs above, it seems the participant count and possible vote count is 2. CMIIW, from what I understand, doesn't for failover to successfully occur is to have 3 participant and 2 vote still available? And what does "We have lost witness acquisition after keeping, but we have quorum! " actually mean?

yaroslav (staff) wrote:
Tue Nov 25, 2025 8:16 am
AFAIK, not for SMB witness.
I see, so is it available for Node Witness and Heartbeat? Could you tell me where is the config so I can change for the timeout and number of retry?
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Tue Nov 25, 2025 9:31 am

from what I understand, doesn't for failover to successfully occur is to have 3 participant and 2 vote still available?
2 votes are enough for a quorum, but not enough to withstand one server's failure.
And what does "We have lost witness acquisition after keeping, but we have quorum! " actually mean
It means that 2 nodes still talk to each other and therefore form a quorum. Once comms are up, it should reconnect to witness.
I see, so is it available for Node Witness and Heartbeat? Could you tell me where is the config so I can change for the timeout and number of retry?
AFAIK, Node Majority and heartbeat don't stop retrying. They do have the entries in the config file to alter the time between retries, but I won't recommend altering those under normal operation.
muhlitfan
Posts: 11
Joined: Thu Oct 09, 2025 6:08 am

Tue Nov 25, 2025 11:25 pm

I see, okay. Thank you for your response , yaroslav. Have a good day!
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Wed Nov 26, 2025 6:19 am

Good luck with your project :)
P.s. if it is a test cluster, give a try to NVMe-oF https://www.starwindsoftware.com/techni ... ew-program (more about NVME-OF https://www.starwindsoftware.com/resour ... s-nvme-of/)
Post Reply