VSAN Down

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Anatoly (staff), Max (staff)

Post Reply
CedricT
Posts: 36
Joined: Mon Apr 15, 2019 1:14 pm

Fri Apr 09, 2021 11:38 am

Hi,

So we have a vsan vsphere with 2 nodes. 3 volumes are configured. Since 2 days we an issue with one volume only and one vm starwind.
The volume becomes unreachable and one vm starwind can't be reach throught the console management. To correct it , we have to restart manually the starwind VSA on the vm and everything works for a some time (SYNC) and starts again to fail.

It starts with these messages and becomes unreachable after:
Ha-device commande "WRITE". request respond time is longer ...
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Fri Apr 09, 2021 12:00 pm

Greetings,

Could you please check the underlying storage? Also, are you running any backups at that time? Please share the logs with me https://knowledgebase.starwindsoftware. ... collector/
CedricT
Posts: 36
Joined: Mon Apr 15, 2019 1:14 pm

Fri Apr 09, 2021 12:08 pm

Hi,

We already checked the storage and all SSDs , HDDs , Controller Raid, are OK .

Backups are scheduled at night but Starwind VMS are not concerned by that.

We collect the logs and send everything.

I precise we are using the last version of starwind (Version 8 build 14033)

Thanks again.
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Fri Apr 09, 2021 2:13 pm

Here is what I found in the logs
4/9 12:37:43.667222 db Common: CStarWindStorageDevice::hangedTaskExist: Storage device hanged operation: location = C:\StarWind\storage\mnt\disk1\Infra\Infra.img, request = 0x00007F6A5FF1B880, cdb[0] = 0x8A, cdb[1] = 0x00, execution time (10337 msec) more than timeout (7000 msec)!

4/9 12:37:43.667227 db Common: CStarWindStorageDevice::hangedTaskExist: Storage device hanged operation: location = C:\StarWind\storage\mnt\disk1\Infra\Infra.img, request = 0x00007F6A62C012A0, cdb[0] = 0x8A, cdb[1] = 0x00, execution time (8480 msec) more than timeout (7000 msec)!

Tweak the following timeout values in the config files on each node.
<StorPerfDegTimeLimitMs value="15000"/>
<iScsiPingCmdSendCmdTimeoutInSec value="10"/>
<UnderlyingStorageTimeoutInSec value="15"/>

Here is the procedure
1. Make sure that the HA devices are synchronized.
2. Make sure that all HA devices have active sessions to them from both servers.
3. Stop StarWind VSAN service on one node with systemctl stop StarWindVSA
4. Go to /opt/StarWind/StarWindVSA/drive_c/StarWind/
5. Copy StarWind.cfg
6. Go to StarWind.cfg and set the values outlined above for 3 parameters.
7. Save and exit.
8. Start StarWind VSAN service with systemctl start StarWindVSA
9. Wait for the fast sync to be over.
10. Repeat for host 2.
CedricT
Posts: 36
Joined: Mon Apr 15, 2019 1:14 pm

Mon Apr 12, 2021 9:53 am

Hi,

I applied your recommandations and since no more problem!

Thanks again!
yaroslav (staff)
Staff
Posts: 2279
Joined: Mon Nov 18, 2019 11:11 am

Mon Apr 12, 2021 2:46 pm

Cedric,

You are always welcome. Do not hesitate to contact us here if assistance is required.
Post Reply