2 Nodes Node Majority + Witness node

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Anatoly (staff), Max (staff)

Post Reply
Hendrik
Posts: 9
Joined: Thu Jun 15, 2017 4:51 am

Thu Jul 10, 2025 3:36 pm

Hi,

I've created successfully 2 Nodes HA using Node Majority + Witness Node on Windows (install StarWind VSAN Free as application, version 8.0.19551), and I can see that device HAImage1 can synchronized. I'm using below script :

param($addr="172.17.11.50", $port=3261, $user="root", $password="starwind",
$addr2="172.17.11.39", $port2=$port, $user2=$user, $password2=$password,
$addrW="172.17.11.41", $portW=$port, $userW=$user, $passwordW=$password,
#common
$initMethod="Clear",
$size=1024,
$sectorSize=4096,
$failover=1,
$bmpType=2,
$bmpStrategy=0,
#primary node
$imagePath="C:\StarWind",
$imageName="HAImage1",
$createImage=$true,
$storageName="",
$targetName="iqn.2008-08.com.starwindsoftware:md200256-targetha31",
$targetAlias="targetha31",
$poolName="pool1",
$syncSessionCount=1,
$aluaOptimized=$true,
$cacheMode="none",
$cacheSize=0,
$syncInterface="#p2=172.17.11.39:3260;#p3=172.17.11.41:3260",
$hbInterface="",
$createTarget=$true,
$bmpFolderPath="C:\StarWind\Bitmap",
#secondary node
$imagePath2=$imagePath,
$imageName2="HAImage1",
$createImage2=$true,
$storageName2="",
$targetName2="iqn.2008-08.com.starwindsoftware:p716-10-1004-partnerha32",
$targetAlias2="partnerha32",
$poolName2="pool1",
$syncSessionCount2=1,
$aluaOptimized2=$false,
$cacheMode2=$cacheMode,
$cacheSize2=$cacheSize,
$syncInterface2="#p1=172.17.11.50:3260;#p3=172.17.11.41:3260",
$hbInterface2="",
$createTarget2=$true,
$bmpFolderPath2=$bmpFolderPath,
#third node
$imagePathW=$imagePath,
$imageNameW="witness33",
$targetNameW="iqn.2008-08.com.starwindsoftware:gesdaprodsrv999-witness33",
$targetAliasW="witness33",
$syncInterfaceW="#p1=172.17.11.50:3260;#p3=172.17.11.39:3260",
$hbInterfaceW="",
$nodeTypeW=8
)

Import-Module StarWindX

try
{
Enable-SWXLog

$server = New-SWServer -host $addr -port $port -user $user -password $password

$server.Connect()

$firstNode = new-Object Node

$firstNode.HostName = $addr
$firstNode.HostPort = $port
$firstNode.Login = $user
$firstNode.Password = $password
$firstNode.ImagePath = $imagePath
$firstNode.ImageName = $imageName
$firstNode.Size = $size
$firstNode.CreateImage = $createImage
$firstNode.StorageName = $storageName
$firstNode.TargetAlias = $targetAlias
$firstNode.CreateTarget = $true
$firstNode.SyncInterface = $syncInterface
$firstNode.HBInterface = $hbInterface
$firstNode.PoolName = $poolName
$firstNode.SyncSessionCount = $syncSessionCount
$firstNode.ALUAOptimized = $aluaOptimized
$firstNode.CacheMode = $cacheMode
$firstNode.CacheSize = $cacheSize
$firstNode.FailoverStrategy = $failover
$firstNode.BitmapStoreType = $bmpType
$firstNode.BitmapStrategy = $bmpStrategy
$firstNode.BitmapFolderPath = $bmpFolderPath
#
# device sector size. Possible values: 512 or 4096(May be incompatible with some clients!) bytes.
#
$firstNode.SectorSize = $sectorSize

$secondNode = new-Object Node

$secondNode.HostName = $addr2
$secondNode.HostPort = $port2
$secondNode.Login = $user2
$secondNode.Password = $password2
$secondNode.ImagePath = $imagePath2
$secondNode.ImageName = $imageName2
$secondNode.CreateImage = $createImage2
$secondNode.StorageName = $storageName2
$secondNode.TargetAlias = $targetAlias2
$secondNode.CreateTarget = $true
$secondNode.SyncInterface = $syncInterface2
$secondNode.HBInterface = $hbInterface2
$secondNode.SyncSessionCount = $syncSessionCount2
$secondNode.ALUAOptimized = $aluaOptimized2
$secondNode.CacheMode = $cacheMode2
$secondNode.CacheSize = $cacheSize2
$secondNode.FailoverStrategy = $failover
$secondNode.BitmapStoreType = $bmpType
$secondNode.BitmapStrategy = $bmpStrategy
$secondNode.BitmapFolderPath = $bmpFolderPath

$thirdNode = new-Object Node

$thirdNode.HostName = $addrW
$thirdNode.HostPort = $portW
$thirdNode.Login = $userW
$thirdNode.Password = $passwordW
$thirdNode.ImagePath = $imagePathW
$thirdNode.ImageName = $imageNameW
$thirdNode.TargetAlias = $targetAliasW
$thirdNode.SyncInterface = $syncInterfaceW
$thirdNode.HBInterface = $hbInterfaceW
$thirdNode.FailoverStrategy = $failover
$thirdNode.Type = $nodeTypeW

$device = Add-HADevice -server $server -firstNode $firstNode -secondNode $secondNode -thirdNode $thirdNode -initMethod $initMethod

$syncState = $device.GetPropertyValue("ha_synch_status")

while ($syncState -ne "1")
{
#
# Refresh device info
#
$device.Refresh()

$syncState = $device.GetPropertyValue("ha_synch_status")
$syncPercent = $device.GetPropertyValue("ha_synch_percent")

Start-Sleep -m 2000

Write-Host "Synchronizing: $($syncPercent)%" -foreground yellow
}
}
catch
{
Write-Host $_ -foreground red
}
finally
{
$server.Disconnect()
}

The problem happened when I try to simulate :
on Node2 (p2) : stop StarWind VSAN service
on Node1 (p1) : restart StarWind VSAN service
on Node3 (p3/witness) : done nothing, service running
when I done that, the LUN will be gone (checked on Node1). It will mount again after I start StarWind VSAN service at Node2
I already check port connection from/to all Node, port 3260 and 3261, all are healthy
Why Node1 and Node3 can't have the quorum ?

Thanks,

Hendrik Saiyan
yaroslav (staff)
Staff
Posts: 3680
Joined: Mon Nov 18, 2019 11:11 am

Thu Jul 10, 2025 4:51 pm

Welcome to the StarWind forum!
Under normal conditions, I'd suggest avoiding that. Good test, though.
You need to start the service on all nodes after such an outage, or mark the service as synchronized manually.
Hendrik
Posts: 9
Joined: Thu Jun 15, 2017 4:51 am

Fri Jul 11, 2025 7:11 am

Hi, yaroslav

I've tried that, even shutdown node2, then restart node1, while keeping node3 up and running, but the LUN still failed
I even download trial license key and applied for 3 nodes, then using management console to create replication partner using node majority and witness, but same problem happened. What I maybe missing ? is it possible the version need to be updated (even though I can't find another build than build 19551 using URL https://www.starwindsoftware.com/tmplin ... ind-v8.exe ) ?

Thanks for your patience,

Regards,

Hendrik Saiyan
yaroslav (staff)
Staff
Posts: 3680
Joined: Mon Nov 18, 2019 11:11 am

Fri Jul 11, 2025 7:57 am

You need to manually mark as synchronized (and it is normal behavior). See this how to https://knowledgebase.starwindsoftware. ... -blackout/
You will not be able to revert to free after trial tho. That's why we recommend strictly following restart procedures (see https://knowledgebase.starwindsoftware. ... vers%20for for restart and for simultaneous shutdown https://knowledgebase.starwindsoftware. ... production). When in a node majority, you need at least 2 nodes to be up, at least one is a storage provider. Witness is not carrying any storage, so leaving it running alone will not make any impact on LUN availability. To resume after an outage, you need all replication partners to be up. If write back cache is used, you will always need manual intervention.
The latest build is always sitting at that URL you shared below.

p.s. While having a trial key armed, I'd suggest reaching out to the person who provided you with the trial key for a tech call.
Hendrik
Posts: 9
Joined: Thu Jun 15, 2017 4:51 am

Sat Jul 12, 2025 4:58 am

Hi yaroslav,

According to your explanation :
When in a node majority, you need at least 2 nodes to be up, at least one is a storage provider. Witness is not carrying any storage, so leaving it running alone will not make any impact on LUN availability

My understanding that same situation apply in my scenario for 2 Node HA and 1 Witness node, which Node1 as storage provider (at least one) is up after restarted and Node3 as witness node also up
I know I can do manual mark as synchronized on Node1, but I want to prevent that because full synchronization will happen when Node2 is up which will take quite a time for 1TB of LUN

And I tried trial license just for take a grasp of GUI configuration just to make sure I followed everything according to StarWind's article/guide

Regards,

Hendrik Saiyan
yaroslav (staff)
Staff
Posts: 3680
Joined: Mon Nov 18, 2019 11:11 am

Sat Jul 12, 2025 11:09 am

You are always welcome!
Yes, you got it right. In 2-node HA+ witness StarWind HA can tolerate only one node down.
Speaking about outages, explore disk journals https://knowledgebase.starwindsoftware. ... a-devices/.
P.S. It will be cool to schedule some tech call with you to let us help you more. Request the trial key and contact the person who gives you that key so our tech can help you with proof of concept setup.
Hendrik
Posts: 9
Joined: Thu Jun 15, 2017 4:51 am

Sun Jul 13, 2025 10:40 am

Hi, yaroslav

Thanks for your help, well noted.
About continous journal, in powershell command, $bmpStrategy should be set to 1 or 2 ?

Regards,

Hendrik Saiyan
yaroslav (staff)
Staff
Posts: 3680
Joined: Mon Nov 18, 2019 11:11 am

Sun Jul 13, 2025 1:52 pm

You are always welcome :) I am happy I could help.
2.
Please also make sure that the bitmap storage is as fast or faster than the device that houses *img.

P.s. bitmap will not save you from node isolation consequences or turning off a node iff write-back caching is enabled.
Hendrik
Posts: 9
Joined: Thu Jun 15, 2017 4:51 am

Mon Jul 14, 2025 4:10 pm

Hi, yaroslav

Thanks for the continous journal value,
I also want to confirmed whether 2 Node HA with heartbeat have different behaviour with 2 Node HA with Witness node (Node Majority) ?
I just test with 2 Node HA (Heartbeat), when Node1 is up and Node2 is down, the LUN still up and running until Node1 rebooted, then I must mark as synchronized and Node1 will keep LUN running without Node2
unlike 2 Node HA + Node witness (Node majority), when Node1 is up, Node2 is down, and Node3 is also down, then the LUN will be gone immediately
Is above condition true ?

Regards,

Hendrik Saiyan
yaroslav (staff)
Staff
Posts: 3680
Joined: Mon Nov 18, 2019 11:11 am

Mon Jul 14, 2025 4:54 pm

You are always welcome.
Thanks for sharing your tests!
I also want to confirmed whether 2 Node HA with heartbeat have different behaviour with 2 Node HA with Witness node (Node Majority) ?
Yes. Because Node Majority implies the network configuration that is vulnerable to split-brain. That's why the StarWind VSAN would make storage go completely dark rather than let dangerous i/o happen.
I just test with 2 Node HA (Heartbeat), when Node1 is up and Node2 is down, the LUN still up and running until Node1 rebooted, then I must mark as synchronized and Node1 will keep LUN running without Node2
The devices that were created with heartbeat failover strategy still can be accessed over iSCSI when there is only one node running.
then I must mark as synchronized and Node1 will keep LUN running without Node2
I don't follow this bit. There's no need for manual intervention unless node1 is also restarted while node2 is down, or write-back cache is used.
Hendrik
Posts: 9
Joined: Thu Jun 15, 2017 4:51 am

Tue Jul 15, 2025 3:22 pm

Hi, yaroslav

I don't follow this bit. There's no need for manual intervention unless node1 is also restarted while node2 is down, or write-back cache is used.
>>> Yes, that's what I meant

Thanks for your help

Regards,
Hendrik Saiyan
yaroslav (staff)
Staff
Posts: 3680
Joined: Mon Nov 18, 2019 11:11 am

Tue Jul 15, 2025 3:47 pm

Hendrik,

Always happy to help.
Post Reply