2 Nodes StarWind Virtual SAN (Free Edition)

Software-based VM-centric and flash-friendly VM storage + free version
Post Reply
Hendrik
Posts: 12
Joined: Thu Jun 15, 2017 4:51 am

Tue Jun 02, 2026 6:43 am

Hi,
I've setup 2 nodes with heartbeat failover strategy using powershell script, everything is running smoothly until 2nd node (backup node) is not poweren on at the same time as 1st node (production) which cause 1st node status goes to "Not Synchronized". I know we can do "mark as synchronize" on 1st node so LUN will active and VM inside it can be running normally. The problem is full synchronization will occur when 2nd node is up and running, which took so much time for 500GB of .img file.
the idea to overcome this, usually do backup of HA.swdsk file, then do remove-hapartner (while 2nd node still disconnected) on 1st node. After that the LUN is working because it work as stand alone.
After 2nd node is up, I do stop-service on starwindservices then copy the backup HA.swdsk to /headers folder, then start the service again. Then 2nd node will do fast synchronization, not full
The problem with this method is we have to stop the service which cause the VM halted, until we start the starwind service again.

My question is :
Can we do add-hapartner or add-hadevice using same serial ID on 1st node to 2nd node without creating any new image, new ha device, and targets ?

Thanks for your support,

Regards,
Hendrik
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Tue Jun 02, 2026 7:20 am

Hendrik,

StarWind HA storage is designed in a way that both nodes can be "active" if synchronized. See more about device priorities viewtopic.php?t=5731.
If you want to retain the data, you need to mark one node as synchronized and then wait for full sync to complete. No o other way around.
The workaround you are doing is not supported because it is against the logic of StarWind VSAN, and I doubt that there is any replication going on. Each target is "actual data" that is frozen in time and is altered by i/o of each client chaotically. OS, in turn, sees every target as "same" and treats it as the "same" disk.
That's how you get the file system corrupted. That's why the only non-disruptive way to resume the node after a blackout is to mark the node as synchronized and go through the full sync. Lower the sync priority if you are seeing a performance impact.

Speaking of full sync duration, can I please have the storage configuration (storage type and RAID level, and settings) and some numbers (i.e., how long does it take)? Also, do you have any cache enabled on StarWind VSAN devices? What is the priority of replication that is set? Do you see any reconnects in the log of the not-synchronized StarWind VSAN device?
Hendrik
Posts: 12
Joined: Thu Jun 15, 2017 4:51 am

Tue Jun 02, 2026 3:12 pm

Hi Yaroslav,

The full sync for 500Gb approx. took around 90 minutes, I only use 1G link for sync and heartbeat channel
So the idea is to wait for 10 minutes when 1st node start, if 2nd node didn't showed up, then go to stand alone mode (either remove-hapartner or just using imagefile1 device with target). Check every 15 minutes whether 2nd node is up, then do add-hapartner or any other way so 2nd node can synch without using full mode

Is it possible to do that using powershell ?

Regards,
Hendrik
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Tue Jun 02, 2026 3:58 pm

Could you please tell me more about the underlying storage configuration?
You can just start 2 nodes at a time.
Hendrik
Posts: 12
Joined: Thu Jun 15, 2017 4:51 am

Wed Jun 03, 2026 2:36 pm

Hi Yaroslav,

We have 2 building side by side with separate electrical and different working hour, building A with production server (1st node) have working hour from 08:00 to 17:00 while building B (2nd node) have working hour from 09:00 to 18:00. each server must be turned off at end of working hour and turned on again next day manually by each responsible staff at each building. Production server at building A hosted VMs which must be running at the 08:00. The production server have iSCSI conneciton to 127.0.0.1 (itself) to create LUN which host Hyper-V VMs and this LUN replicated to server backup (2nd node) at building B with 1GB link.
If everyday we must mark as synchronized then at 09:00 2nd node start to do full synch, just afraid during that 90 minutes, if something happened on production server, then we have nothing to failover.
That's why I'm asking is it possible to running as standalone device first on 1st node then create HA partner later when 2nd node is up without halt or stop the LUN ?

Thanks for your support,

Regards,
Hendrik
yaroslav (staff)
Staff
Posts: 4309
Joined: Mon Nov 18, 2019 11:11 am

Wed Jun 03, 2026 3:15 pm

Hi,

I guess this scenario is where active-passive replication could shine. Active-active mirroring is not quite OK due to the system handling routines.
In other words, I am not sure StarWind VSAN fits here well. Sure, you can have 2 standalone devices, but there will be an inevitable lag between sites if you do active-passive replication.
Post Reply