VSA for Vsphere-Linux - 2 node HyperConverged home lab

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

whodat342
Posts: 1
Joined: Mon Jan 03, 2022 3:28 am

Mon Jan 03, 2022 3:44 am

Hi all,

Hoping someone here can answer my questions. I have two physical machines running ESXI 6.7 u3 bare metal with VCenter/Vsphere 7 deployed in a VM - all working fine. My goal is to get StarWind VSAN working in a 2 mode HyperConverged setup going using local storage within each host only. This is just a test lab so no RAID or iSCSI. Assuming that I can just use an existing ESXi data store as separate hard disks made available to linux StarWind VSA VM's?

I've been following the following guide -https://www.starwindsoftware.com/resour ... here/which has worked fine; I've configured all the networks in Center and have the two Starwind VSA's deployed and working - I can access the web based console for each StarWind VM <IPADDRESS:9090> and see both hard drives (SSD and HDD) as well. Through the console I have formatted each as XFS and are mounted to /mnt/disk1 and /mnt/disk2 respectively.

I also have a Windows 10 VM where I have installed the StarWind Management Console and powershell scripts.

1. What are the correct scripts to run to configure HA - i.e storage pool and add disks to the pool for each StarWind VM?
2. Can I use the powershell scripts to work with the already formatted disks on each Starwind VSA VM?
3. Do I need to run the powershell scripts separately for each Starwind VSA?
4. Is there a correct order in which to create HA -ie. the pool and add the disks via Powershell?
5. how do I verify everything is working - i.e test the HA failover?

Thanks
Prandur
Posts: 7
Joined: Mon Jan 03, 2022 2:07 pm

Mon Jan 03, 2022 2:13 pm

Hi,

I'm basically trying to do the same, with a similar setup.

1.The scripts are under
C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell
if you've used the default installation path. The script for the 2 node cluster would be CreateHA_2.ps1
2. As far as I understand it yes. The script will create an imagefile on the respective mountpoint.
4. That should be done with the creation powershell script in one step as I understand it.
5. When I'll get it working I'd put a test-vm on the storage und just shut down one of the nodes I guess.

I'm one step further and getting devices created, but for me it's just not syncing (iscsi connections are there)
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Tue Jan 04, 2022 6:21 am

Yes, these are the correct steps.

Prandur,
How large the device is? Could you kindly share the logs from both nodes with me? You can use log collector script https://knowledgebase.starwindsoftware. ... collector/. Also, having the script you used here will be helpful.
Finally, please make sure to have enough space in the /mnt/your-mount-point-name (i.e., mount point that represents the underlying storage, not the /mnt directory itself).
Prandur
Posts: 7
Joined: Mon Jan 03, 2022 2:07 pm

Wed Jan 05, 2022 10:55 am

Hi yaroslav,

sorry for the late answer.

Im using the following script

Code: Select all

param($addr="192.168.0.251", $port=3261, $user="root", $password="starwind",
	$addr2="192.168.0.252", $port2=$port, $user2=$user, $password2=$password,
#common
	$initMethod="Clear",
#	$size=1550000,
    $size=1024,
	$sectorSize=512,
	$failover=0,
#primary node
	$imagePath="/mnt/ssd",
	$imageName="ssd01",
	$createImage=$true,
	$storageName="",
	$targetAlias="ssd01",
	$autoSynch=$true,
	$poolName="pool1",
	$syncSessionCount=1,
	$aluaOptimized=$true,
	$cacheMode="none",
	$cacheSize="",
	$syncInterface="#p2=172.16.255.12:3260" -f $addr2,
	$hbInterface="#p2=172.16.250.12:3260" -f $addr2,
	$createTarget=$true,
#secondary node
	$imagePath2="/mnt/ssd",
	$imageName2="ssd02",
	$createImage2=$true,
	$storageName2="",
	$targetAlias2="ssd02",
	$autoSynch2=$true,
	$poolName2="pool1",
	$syncSessionCount2=1,
	$aluaOptimized2=$false,
	$cacheMode2=$cacheMode,
	$cacheSize2=$cacheSize,
	$syncInterface2="#p1=172.16.250.11:3260" -f $addr,
	$hbInterface2="#p1=172.16.255.11:3260" -f $addr,
	$createTarget2=$true
	)
	
Import-Module StarWindX

try
{
	Enable-SWXLog

	$server = New-SWServer -host $addr -port $port -user $user -password $password

	$server.Connect()

	$firstNode = new-Object Node

	$firstNode.HostName = $addr
	$firstNode.HostPort = $port
	$firstNode.Login = $user
	$firstNode.Password = $password
	$firstNode.ImagePath = $imagePath
	$firstNode.ImageName = $imageName
	$firstNode.Size = $size
	$firstNode.CreateImage = $createImage
	$firstNode.StorageName = $storageName
	$firstNode.TargetAlias = $targetAlias
	$firstNode.AutoSynch = $autoSynch
	$firstNode.SyncInterface = $syncInterface
	$firstNode.HBInterface = $hbInterface
	$firstNode.PoolName = $poolName
	$firstNode.SyncSessionCount = $syncSessionCount
	$firstNode.ALUAOptimized = $aluaOptimized
	$firstNode.CacheMode = $cacheMode
	$firstNode.CacheSize = $cacheSize
	$firstNode.FailoverStrategy = $failover
	$firstNode.CreateTarget = $createTarget
    
	#
	# device sector size. Possible values: 512 or 4096(May be incompatible with some clients!) bytes. 
	#
	$firstNode.SectorSize = $sectorSize
    
	$secondNode = new-Object Node

	$secondNode.HostName = $addr2
	$secondNode.HostPort = $port2
	$secondNode.Login = $user2
	$secondNode.Password = $password2
	$secondNode.ImagePath = $imagePath2
	$secondNode.ImageName = $imageName2
	$secondNode.CreateImage = $createImage2
	$secondNode.StorageName = $storageName2
	$secondNode.TargetAlias = $targetAlias2
	$secondNode.AutoSynch = $autoSynch2
	$secondNode.SyncInterface = $syncInterface2
	$secondNode.HBInterface = $hbInterface2
	$secondNode.SyncSessionCount = $syncSessionCount2
	$secondNode.ALUAOptimized = $aluaOptimized2
	$secondNode.CacheMode = $cacheMode2
	$secondNode.CacheSize = $cacheSize2
	$secondNode.FailoverStrategy = $failover
	$secondNode.CreateTarget = $createTarget2
        
	$device = Add-HADevice -server $server -firstNode $firstNode -secondNode $secondNode -initMethod $initMethod
    
	while ($device.SyncStatus -ne [SwHaSyncStatus]::SW_HA_SYNC_STATUS_SYNC)
	{
		$syncPercent = $device.GetPropertyValue("ha_synch_percent")
	        Write-Host "Synchronizing: $($syncPercent)%" -foreground yellow

		Start-Sleep -m 2000

		$device.Refresh()
	}
}
catch
{
	Write-Host $_ -foreground red 
}
finally
{
	$server.Disconnect()
}
The Setup is as following
2 ESXi with 2 network adapters.
192.168.0.0/24 and 172.16.250.0/24 are on one adapter for management and heartbeat
172.16.255.0/24 for sync on the second adapter.

I was just doing the creation again. I can see the ISCSI connection for the syncing, and in the console events it is reportet that all sync connections are present.
Then I can see a "Disk operation failed. Disk path: C:\StarWind\storage\mnt\ssd\ssd01.img. Error code (25)." after which the initial sync is reportet as failed in the events.
Right now I'm testing with 1024MB, which should later be 1550000MB. The disk itself is a little bit larger, so that should fit.

The script stays at "Synchronizing: 0%"
I couldn't attach the logs to the thread as they are too big. I've uploaded them to my nextcloud instance (https://nextcloud.tanaros.org/index.php ... SMtWRjBpgK)
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Wed Jan 05, 2022 8:09 pm

Try this script https://forums.starwindsoftware.com/vie ... p+3#p31505 (2x HB interfaces are better).
Create a 1 GB file first, then expand it with ExtendDevice from C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell
Did the device create nicely in the Management Console (i.e., does it have any warning signs/exclamation marks/etc)? Also, make sure to have the iSCSI File Server role not installed in your system.
Prandur
Posts: 7
Joined: Mon Jan 03, 2022 2:07 pm

Wed Jan 05, 2022 10:53 pm

Hi,

as I was using a windows 10 machine which isn't officially supported I switched to a new eval server installation.
I updated the script as you've suggested with the second HB on the management network.

When I run the script, both devices are being created. I can also see the respective image files in the linux file system on both vms.
The eventlog in the management console states the following after the devices
1. state changed to synchronizing
2. Heartbeat connections established for both interfaces
3. Snychronizing connections established
4. On Node01, "Disk operation failed. Disk path: C:\StarWind\storage\mnt\ssd\ssd01.img. Error code (25)." (which is weird, as the target vms are both linux)
5. Node state changed to "Not synchronized"

I then have warning signs which highlight, that the devices are not synchronized.
No ISCSI role is installed, only the ISCSI Initiator is used to mount a volume from a local nas.

The updated config for completion

Code: Select all

param($addr="192.168.0.251", $port=3261, $user="root", $password="starwind",
	$addr2="192.168.0.252", $port2=$port, $user2=$user, $password2=$password,
#common
	$initMethod="Clear",
#	$size=1550000,
    $size=1024,
	$sectorSize=512,
	$failover=0,
#primary node
	$imagePath="/mnt/ssd",
	$imageName="ssd01",
	$createImage=$true,
	$storageName="",
	$targetAlias="ssd01",
	$autoSynch=$true,
	$poolName="pool1",
	$syncSessionCount=1,
	$aluaOptimized=$true,
	$cacheMode="none",
	$cacheSize="",
	$syncInterface="#p2=172.16.255.12:3260" -f $addr2,
	$hbInterface="#p2=172.16.250.12:3260,192.168.0.252:3260" -f $addr2,
	$createTarget=$true,
#secondary node
	$imagePath2="/mnt/ssd",
	$imageName2="ssd02",
	$createImage2=$true,
	$storageName2="",
	$targetAlias2="ssd02",
	$autoSynch2=$true,
	$poolName2="pool1",
	$syncSessionCount2=1,
	$aluaOptimized2=$true,
	$cacheMode2=$cacheMode,
	$cacheSize2=$cacheSize,
	$syncInterface2="#p1=172.16.250.11:3260" -f $addr,
	$hbInterface2="#p1=172.16.255.11:3260,192.168.0.251:3260" -f $addr,
	$createTarget2=$true
	)
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Thu Jan 06, 2022 4:22 am

On Node01, "Disk operation failed. Disk path: C:\StarWind\storage\mnt\ssd\ssd01.img. Error code (25)." this one points to the underlying storage hiccup of some sort. What is the NAS storage configuration?
Prandur
Posts: 7
Joined: Mon Jan 03, 2022 2:07 pm

Thu Jan 06, 2022 10:07 am

It's one target in the ISCSI Initiator which is mounted as E:
I tried again by unmounting E: and stopping the initiator service, but I get still the same disk operation error. In all other regards the Server is completely fresh installed.

I can see that the symlink is correctly there from /opt/StarWind/StarWindVSA/drive_c/StarWind/storage/mnt to /mnt.

Code: Select all

[root@vsan-hv01 storage]# pwd
/opt/StarWind/StarWindVSA/drive_c/StarWind/storage
[root@vsan-hv01 storage]# ls -altr
total 4
lrwxrwxrwx. 1 root root    4 Feb 27  2020 mnt -> /mnt
lrwxrwxrwx. 1 root root    6 Feb 27  2020 media -> /media
drwxr-xr-x. 2 root root   30 Feb 27  2020 .
drwxr-xr-x. 8 root root 4096 Jan  6 10:38 ..

Code: Select all

[root@vsan-hv01 ssd]# pwd
/opt/StarWind/StarWindVSA/drive_c/StarWind/storage/mnt/ssd
[root@vsan-hv01 ssd]# ls -altrh
total 1.1G
drwxr-xr-x. 4 root root   29 Dec 29 15:47 ..
-rw-r--r--  1 root root 4.0K Jan  6 10:38 ssd01.swdsk
-rw-r--r--  1 root root 1.0G Jan  6 10:38 ssd01.img
-rw-r--r--  1 root root 4.0K Jan  6 10:38 ssd01_HA.swdsk
drwxr-xr-x  2 root root   64 Jan  6 10:38 .
edit:
I have 2 disks mounted on the vsan VMs. I tried creating the targets on the second one with the same result except for the changed path.
When I look at the files after the scribt ran I can see the respective files as I would expect them.

Even with debuglogging enabled I can't see anything more helpful than the following.

Code: Select all

1/6 12:41:46.109369 12d debug: *** Swn_CheckCompletions: io_event for ov 0000000002420380, res 22, err 25!
1/6 12:41:46.109400 12d IMG: *** ImageFile_IoCompleted: Disk operation failed. Disk path: C:\StarWind\storage\mnt\nvme\ssd01.img. Error code: (25).
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Thu Jan 06, 2022 2:28 pm

E: and stopping the initiator service
Are you connecting the NAS to the VM itself? Present the NAS as the iSCSI device to ESXi and put the VMDK there. Try the trial license for test purposes (NOTE: YOU WILL NOT BE ABLE TO SWITCH TO FREE WITHOUT REINSTALLING VM). This will show us if there is something with the script OR the setup.
Do you have NVMes in NAS?
Prandur
Posts: 7
Joined: Mon Jan 03, 2022 2:07 pm

Thu Jan 06, 2022 6:54 pm

Are you connecting the NAS to the VM itself? Present the NAS as the iSCSI device to ESXi and put the VMDK there. Try the trial license for test purposes (NOTE: YOU WILL NOT BE ABLE TO SWITCH TO FREE WITHOUT REINSTALLING VM). This will show us if there is something with the script OR the setup.
Do you have NVMes in NAS?
That might be a misunderstanding.
As this is a limited learning homelab, I only have 2 esxi at the moment. The windows vm also works as my veeam backup server, which uses a ISCSI disk from my local nas as storage.

The ESXi hosts themself have 1 sata ssd and 1 nvme as local storage. On each storage the vsan vms have a eager zeroed disk attached which are mirrored in size and configuration.


For the setup. With the trial licence I am able to create a ha-set from the management console. I found that it is slightly different as I was setting up "/mnt/ssd" as path, whereas it created the image files just there.
When I set it up with the management console, it created an additional folder "/mnt/ssd/ssd01" and puts the image files there.

When I only delete the replica on hv02 and re-add it with the management console (hv01 had the issue in the eventslog with the mentioned disk operation problem), after running the powershell-script, it creates on hv02 again the additional folder but getting the sync done.

What I don't understand is why does the manegement console create an additional folder, as it appears to have an impact on how the creation works.

I was adding the set in the management console via

Code: Select all

HV01 
* add device (advanced)
* HDD
* virtual disk 
* new virtual disk (name: ssd01; location VSA Storage\mnt\ssd; size: 1gb)
* thick provisioned; 512byte sectors
* no cache
* create new target, alias ssd01

After that with the replication manager
* add partner synchronous replication
* Host ip: 192.168.0.252
* Heartbeat as failover strat
* create new partner device (location: VSA Storage\mnt\ssd) target name also with ssd01
* Networks: 
** 172.16.255.0 for heartbeat and sync
** 172.16.250.0 and 192.168.0.0 for heartbeat
* sync from existing device
With these steps, the set gets successfully created. But from all I can see the setup is identical with the scriptsetup

Code: Select all

param($addr="192.168.0.251", $port=3261, $user="root", $password="starwind",
	$addr2="192.168.0.252", $port2=$port, $user2=$user, $password2=$password,
#common
	$initMethod="Clear",
#	$size=1550000,
    $size=1024,
	$sectorSize=512,
	$failover=0,
#primary node
	$imagePath="\mnt\ssd\",
	$imageName="ssd01",
	$createImage=$true,
	$storageName="",
	$targetAlias="ssd01",
	$autoSynch=$true,
	$poolName="pool1",
	$syncSessionCount=1,
	$aluaOptimized=$true,
	$cacheMode="none",
	$cacheSize="",
	$syncInterface="#p2=172.16.255.12:3260" -f $addr2,
	$hbInterface="#p2=172.16.250.12:3260,192.168.0.252:3260" -f $addr2,
	$createTarget=$true,
#secondary node
	$imagePath2="\mnt\ssd\",
	$imageName2="ssd01",
	$createImage2=$true,
	$storageName2="",
	$targetAlias2="ssd01",
	$autoSynch2=$true,
	$poolName2="pool1",
	$syncSessionCount2=1,
	$aluaOptimized2=$false,
	$cacheMode2=$cacheMode,
	$cacheSize2=$cacheSize,
	$syncInterface2="#p1=172.16.255.11:3260" -f $addr,
	$hbInterface2="#p1=172.16.250.11:3260,192.168.0.251:3260" -f $addr,
	$createTarget2=$true
	)
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Thu Jan 06, 2022 8:02 pm

Hi,

Thanks for your update and clarifications. I thought the test img was deleted after the test manually.
Anyway, glad to know that it works now! Thanks for sharing your knowledge.
Please make sure not to use rhe same servers as shared storage to store backups (learn more about 3-2-1 backup rule).
Prandur
Posts: 7
Joined: Mon Jan 03, 2022 2:07 pm

Thu Jan 06, 2022 9:12 pm

Hi,

Sorry but it only works when doing it with the management console.
Using the script still results in the same error. I could only "fix" it when removing the replica and doing this part via the management console. But this approach is not possible with the free licence.
So to switch to a long term solution for the homelab, I'd have to still do the setup via the script.

I tried to replicate the steps taken. in my last post. As far as I can see and understand the management console setup and the setup done via powershell the settings are nearly the same.
The results is basically the same too except 2 diviation.
1. The setup via mgmt console results in an additional subfolder for the image files
2. The setup via powershell results in a disk operation error. (I haven't found a way to fix it after the powershell run with only powershell commands, as the way to redo the replication via management console is not viable for the free version)

Please make sure not to use rhe same servers as shared storage to store backups (learn more about 3-2-1 backup rule).
Not doing this. The windows server is supposed to server as platform to setup the starwind vsan vms and doing veeam to a storage mounted from a local hardware nas (not from the planned shared starwind iscsi)
Prandur
Posts: 7
Joined: Mon Jan 03, 2022 2:07 pm

Thu Jan 06, 2022 10:39 pm

Hi yaroslav,

I just apparently found a way to get it working with the powershell scripts. (I was first unable to reproduce the described steps with powershell)
But I doupt that could be intended.

1. running the CreateHA_2.ps1 with the mentioned settings. Which fails to get the sync running.
2. Break the HA with the RemoveHAPartner.ps1 to remove hv02.
3. remove to local image from hv02 storage
4. add hv02 again with the AddHaPartner.ps1 script and the following settings

Code: Select all

param($addr="192.168.0.251", $port=3261, $user="root", $password="starwind", $deviceName="HAImage1",
	$addr2="192.168.0.252", $port2=$port, $user2=$user, $password2=$password,
#secondary node
	$imagePath2="/mnt/ssd",
	$imageName2="ssd01",
	$createImage2=$true,
	$targetAlias2="ssd01",
	$autoSynch2=$true,
	$poolName2="pool1",
	$syncSessionCount2=1,
	$aluaOptimized2=$true,
	$syncInterface2="#p1=172.16.255.11:3260" -f $addr,
    $hbInterface2="#p1=172.16.250.11:3260,192.168.0.251:3260",
    $selfSyncInterface="#p1=172.16.255.12:3260" -f $addr2,
    $selfHbInterface="#p1=172.16.250.12:3260,192.168.0.252:3260"
	)

Then it got synced and seems to be working (haven't tested yet, I will first setup new vms tomorrow to revert back to free licence).

But I think it is kinda weird. The error (the disk operation error) is on HV01 but it works removing HV02 and readding it.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Fri Jan 07, 2022 5:10 am

Thank you for your update.
You cannot revert it to free license. You need to redeploy the VM.
Noskov.E
Posts: 3
Joined: Thu Jan 06, 2022 12:00 pm

Sun Jan 09, 2022 2:14 pm

I have the same problem when using VSA.
When creating a HA, devices are not synchronized. The script SyncHaDevice.ps1 gives an error:

Code: Select all

1/9 7:39:29.380643 120 FileBrowser: *** CFileBrowser::parsePath: Could not create image : provided path has invalid extension!
1/9 7:39:29.380696 120 Srv: *** iScsiServer::list: Error parsing path: VSA Storage\mnt\
1/9 7:39:29.398050 120 HA: SscPort_ControlRequest: Received Reserve command: 11264 MB
1/9 7:39:30.404790 120 IMG: ImageFile_Extend: Extending/shrinking image by 10240 MBs...
1/9 7:39:30.404834 120 debug: Swn_FileExtendInBytes: File size: 1073741824 - extending for 10737418240 bytes
1/9 7:39:30.404858 120 debug: Swn_GetFileSize: File size: 11811160064, Blocks: 23068672, blksize: 4096 bytes
1/9 7:39:30.404862 120 IMG: ImageFile_Extend: New image size is 11264 MBs.
1/9 7:39:30.404874 120 Common: sw_common::Sw_Disk_Header::open: (file: C:\StarWind\storage\mnt\datastore1-2\datastore1-2.swdsk, readonly: no).
1/9 7:39:30.414320 120 HA: SscPort_ControlRequest: Received Extend command: 11264 MB
1/9 7:39:30.430100 120 conf: ControlConnection::processConnection: Control connection closed.
1/9 7:40:28.616200 6a conf: TelnetListener::listenConnections: Accepted control connection from 192.168.11.78:54446.
1/9 7:40:28.646926 11e FileBrowser: *** CFileBrowser::parsePath: Could not create image : provided path has invalid extension!
1/9 7:40:28.646995 11e Srv: *** iScsiServer::list: Error parsing path: VSA Storage\mnt\
In this case, the mount point /mnt since it is Linux. The device is being created, the error is confusing.
Scripts don't work correctly for VSA?

When creating a HA, script SyncHaDevice.ps1 an error is issued:

Code: Select all

Synchronize device HAImage1
Failed to perform synchronization (1) from 
-
control 0x0000000000EE04C0 -Synchronize: -SynchronizationType:"1"
-
200 Failed: can't find available or valid partner for synchronization.. 
Deleting and re-adding a partner does not help, the devices are in the "out of sync" state.

If you use a trial key, you can mark one of the devices as Synchronized and then they will be resynchronized. I have not yet found how to do this with a script.
Post Reply