Recreating HA Availability after failure in 2 Node VSAN

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Mon Mar 01, 2021 7:55 am

Everything should be alright.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Wed Mar 03, 2021 5:42 am

Well almost alright...

After getting the VSAN working correctly and making sure that it was working and no data was lost, I run Cluster Validation (Hyper-V). I get the following error, which was not there in previous validations.
Two disks have been found on node gidshc01.internal.gulfid.com with duplicate disk signatures or disk GUIDs. The disks involved are physical disk 3 and physical disk 4. There may be a Multipath I/O (MPIO) problem such as the software is not installed or is not working properly. If MPIO is not involved, or MPIO has been verified as working, you must either mask one of these disks off at this node, or run validation and specify a disk list that includes only one of these disks, for example by using the Test-Cluster cmdlet in Windows PowerShell.
I've done a lot of digging around in Hyper-V and MPIO forums but I couldn't find anything that would help and I am thinking it may have something to do with my recovery of the VSAN. Appreciate any thoughts you may have Yaroslav.
yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Wed Mar 03, 2021 8:32 am

Hey,

Do you have MPIO set on both servers?
Please run log collection as described here https://knowledgebase.starwindsoftware. ... collector/ (there should be a folder called HardDriveInfo) and see what disks 3 and 4 are.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Wed Mar 03, 2021 10:31 am

Yeap, there is MPIO on both nodes.

I run the log collection on gidshc01(node in the error message). It appears I cannot find either Disk 3 or Disk 4. According to disk manager though, Disk 4 is the gidshc01 HAImage1.img disk and Node 3 is the corresponding node's (gidshc02) disk forming a csv.

Code: Select all


DeviceID               : \\.\PHYSICALDRIVE5
DiskModel              : STARWIND STARWIND  Multi-Path Disk Device
Partition              : Disk #5, Partition #0
DriveLetter            : G:
VolumeName             : csv2
DiskSizeInGb           : 499.947624206543
PartitionSizeInGb      : 499.982421875
PartitionFreeSpaceInGb : 404.032276153564





DeviceID               : \\.\PHYSICALDRIVE1
DiskModel              : DELL PERC H710P SCSI Disk Device
Partition              : Disk #1, Partition #0
DriveLetter            : D:
VolumeName             : Data
DiskSizeInGb           : 2234.497153759
PartitionSizeInGb      : 2234.482421875
PartitionFreeSpaceInGb : 1032.75970077515





DeviceID               : \\.\PHYSICALDRIVE0
DiskModel              : DELL PERC H710P SCSI Disk Device
Partition              : Disk #0, Partition #2
DriveLetter            : C:
VolumeName             : 
DiskSizeInGb           : 278.868799209595
PartitionSizeInGb      : 278.2734375
PartitionFreeSpaceInGb : 245.918704986572



yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Wed Mar 03, 2021 1:22 pm

Hi,

Those disks are expected to have the same UIDs.
Please see if MPIO is enabled for iSCSI devices.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Wed Mar 03, 2021 2:56 pm

Hi,

MPIO for iSCSI is indeed on.

Oddly after a reboot on gidshc01 the cluster error is now reported on Disk 3 and 5 which is the other VSAN disk (HAImage2). HAImage2 is a non-csv file share drive. This time though Disk 3 appears in the HardDriveInfo after running the log collector. Still no reporting though of the HAImage1 drive.

Code: Select all


DeviceID               : \\.\PHYSICALDRIVE3
DiskModel              : STARWIND STARWIND  Multi-Path Disk Device
Partition              : Disk #3, Partition #0
DriveLetter            : G:
VolumeName             : csv2
DiskSizeInGb           : 499.947624206543
PartitionSizeInGb      : 499.982421875
PartitionFreeSpaceInGb : 404.028770446777





DeviceID               : \\.\PHYSICALDRIVE1
DiskModel              : DELL PERC H710P SCSI Disk Device
Partition              : Disk #1, Partition #0
DriveLetter            : D:
VolumeName             : Data
DiskSizeInGb           : 2234.497153759
PartitionSizeInGb      : 2234.482421875
PartitionFreeSpaceInGb : 1032.75970077515





DeviceID               : \\.\PHYSICALDRIVE0
DiskModel              : DELL PERC H710P SCSI Disk Device
Partition              : Disk #0, Partition #2
DriveLetter            : C:
VolumeName             : 
DiskSizeInGb           : 278.868799209595
PartitionSizeInGb      : 278.2734375
PartitionFreeSpaceInGb : 245.236293792725



yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Wed Mar 03, 2021 3:11 pm

Treo,

Double-checked the UID question. HA devices share one UID while flat images have different UIDs. Are the disks in question HA images? If so, they are to share one UID.
Otherwise, share logs with us.
See if MPIO works fine.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Thu Mar 04, 2021 4:48 am

Hi Yaroslav,

The disks are HA. They are setup identical to the guide https://www.starwindsoftware.com/resour ... rver-2016/ with HAimage1 (csv), HAImage 2(non csv), HAImage3 (witness).

MPIO appears to work fine on both nodes. Is there a particular test you would like me to run for it?

BTW it still puzzles me is that nothing appears in HardDriveInfo for the HAImage1 and HAImage3 drives, even though all the storage in the cluster is owned by the node in question (gidshc01).
yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Thu Mar 04, 2021 7:17 am

Please check CSVtoPhysicalDiskMapping folder and share the logs with me.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Thu Mar 04, 2021 8:47 am

I need some help where to find the CSVtoPhysicalDiskMapping folder.

Also, how do you want me to share the logs with you? I assume on a cloud drive rather than send to forum.
yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Thu Mar 04, 2021 12:28 pm

That folder is in the log archive. You can share them via Google Disk or OneDrive.
Please try reconnecting the recently replicated CSV over iSCSI and make sure to check the "Enable Multipathing" checkbox.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Thu Mar 04, 2021 1:27 pm

I thought you might say that. Unfortunately there is no such folder in the logs. I have loaded the logs and images of the cluster and VSAN of both nodes in the following link https://drive.google.com/drive/folders/ ... vV3s-mdM2f

I have reconnected the csv on both nodes using multipath.
yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Thu Mar 04, 2021 1:52 pm

I have collected the logs, feel free to remove the link.
Could you please try running validation again? Let me know if the problem is still there.
Treo
Posts: 25
Joined: Sun May 17, 2020 5:04 pm

Thu Mar 04, 2021 3:08 pm

Apologies, I confused you with my last statement. What I meant to say is that I reconnected the csv on both nodes using multipath, and then run the logs and took the images. Please tell me if you need anything else.
yaroslav (staff)
Staff
Posts: 2355
Joined: Mon Nov 18, 2019 11:11 am

Mon Mar 08, 2021 8:10 am

Hi Treo,

There are no Headers in StarWindHeaders folder. Could you copy those and upload them to Google Disk once again?
Post Reply