Synchronization failure. Disk error 665

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
drzeina
Posts: 7
Joined: Wed Jan 25, 2023 1:30 am

Wed Jan 25, 2023 6:57 am

I have 2 servers with 3 targets and 1 device per target. Two of them are fine, but the third refuses to synchronize. I think it started with a network issue.
Failover strategy is heartbeat. If I try to manually trigger a full synchronization, it seems to start but fails within a few seconds and the log shows:

IMG: *** ImageFile_IoCompleted: Disk operation failed. Disk path: E:\starwind\Imaging.img. Error code: (665).

I ran chkdsk on the underlying physical disk: there are no issues.

Any ideas?

Thanks
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Wed Jan 25, 2023 8:57 am

Hi,

Analyzing these events requires context (i.e., the full set of logs and System and Application logs), and more details about the system (what is the underlying storage, what is the Hypervisor, is VSAN running inside the VM, etc). Yet, this appears to be a general storage error. Please tell me more about the system and collect the logs in a way we describe it here https://knowledgebase.starwindsoftware. ... collector/
Did it start happening recently, or are you running the first synchronization?
If I try to manually trigger a full synchronization
Do you have access to GUI?
drzeina
Posts: 7
Joined: Wed Jan 25, 2023 1:30 am

Thu Jan 26, 2023 3:32 am

Hi,

The device has been synced before. I think the problem started early January, network issue if I remember correctly. I marked the devices as synchronized on one of the servers, the made the other server sync. Is it possible I marked the wrong device as synchronized?

Starwind VSAN Free (very limited GUI) is running on physical servers (Windows Server 2019). Underlying storage is traditional HDD. One of the devices is used for VMs (Hyper-V) but the one that is not syncing is used for files only.

I'll send you the logs.

Thanks
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Thu Jan 26, 2023 11:58 am

Is it possible I marked the wrong device as synchronized?
It could be true. check this article for more details https://knowledgebase.starwindsoftware. ... -blackout/. Yet, if data is OK the correct side is likely to be marked.
Please consider reducing synchronization priority.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Sat Jan 28, 2023 7:09 pm

Did adjusting the synchronisation priority do a trick?
drzeina
Posts: 7
Joined: Wed Jan 25, 2023 1:30 am

Tue Jan 31, 2023 3:08 pm

No it didn't. Besides, data had already changed on the synchronized copy with new files.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Tue Jan 31, 2023 10:43 pm

data had already changed on the synchronized copy with new files.
The synchronized node is always connected over iSCSI so they are expected to be changed.
May I wonder how you changed synchronization priority and what is was its value?
Could you please check the hardware logs? Is it a single HDD or is it in RAID?
drzeina
Posts: 7
Joined: Wed Jan 25, 2023 1:30 am

Wed Feb 01, 2023 5:57 pm

I put the other devices in maintenance mode and shut down the VSAN services. Then I manually edited the HA.swdsk files and switched the priorities 0 to 1 and 1 to 0 and then restarted the VSAN services.

The image files are on single HDDs on each server.

What do you want me to check in the logs? I had sent you a link to the logs you requested in a DM.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Wed Feb 01, 2023 11:06 pm

I noticed from StarWind logs that there are likely network misconfigurations and the build is outdated.
You need to run a SMART diagnostic for the disk as 665 generally points to issues with underlying storage (i.e., level below StarWind VSAN).
Is it possible to arrange downtime? If yes, consider updating StarWind VSAN to 14398 https://rb.gy/fmhsmb. The new build should make service "tougher" against the delays.
Then I manually edited the HA.swdsk files and switched the priorities 0 to 1 and 1 to 0 and then restarted the VSAN services.
I am afraid you misinterpreted the advice. Please revert the change. You are introducing the risk of data corruption. What you did is not change the synchronization priority but HA device priority.
See more on node priority https://forums.starwindsoftware.com/vie ... f=5&t=5731. Please revert the changes.
See more on synchronization priority https://www.starwindsoftware.com/help/C ... ority.html There is a script "C:\Program Files\StarWind Software\StarWind\StarWindX\Samples\powershell\haSyncPriority.ps1" please set the priority to 5%.
drzeina
Posts: 7
Joined: Wed Jan 25, 2023 1:30 am

Thu Feb 16, 2023 4:18 am

I ran SMART diagnostics and found no issues.
I reverted the HA device priorities, upgraded to 14398, and set synchronization priority to 5% with the script.
It still won't sync.
SMART.gif
SMART.gif (82.62 KiB) Viewed 3083 times
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Thu Feb 16, 2023 8:14 am

I am afraid that HDD is just super slow for synchronization to start.
drzeina
Posts: 7
Joined: Wed Jan 25, 2023 1:30 am

Sun Feb 19, 2023 10:44 pm

Following the advice on this thread https://forums.starwindsoftware.com/vie ... dsk#p29816, I stopped the VSAN servers, started then aborted copying the imagefile from the synchronized server and then restarted the VSANs. It took a while but the synchronization did complete. Perhaps the imagefile was corrupted.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Mon Feb 20, 2023 6:06 am

Hi,
Thanks for your update.
Did you copy the image file alone or did you copy the headers too?
drzeina
Posts: 7
Joined: Wed Jan 25, 2023 1:30 am

Mon Feb 20, 2023 8:27 pm

I only copied the imagefile as the headers seemed fine.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Mon Feb 20, 2023 10:15 pm

Got it. The image itself is a file. Perhaps something could happen on the underlying storage level.
Post Reply