3-Node HA iSCSI - Virtual SAN "Scrubbing"

Petas3 · Mon Feb 12, 2018 2:06 pm

Hello,

I want to know if theres a possibility to do something similar to RAID scrubbing on the HA device that has 3+ nodes (3 nodes needed for so called "scrubbing quorum").
I plan to deploy low cost 3 node setup with multiple HA devices - basically distributed triple mirror so to say. Does StarWind support a function like this, or is it planned?

Thanks

Petas3 · Tue Feb 13, 2018 9:01 am

I have done some more research to this topic and found that detection of this kind of errors can be done and managed somehow, but more questions arose:

Can StarWind deal with unrecoverable disk error on read? How? Does another node serve the request? Can the data be corrected using information from other node(s) automatically on failed read?

Its very important to resolve this possibility of failure as its one of the last concerns I have with this solution.

Thanks

Tue Feb 13, 2018 11:42 am

Hello Petas3,
No, there is no scrubbing in StarWind, this is resolving on a RAID level.
In case of disk error on read another node will serve the request to HA. The HA device on the node with error will be turned off until the moment of finishing full sync, the full synchronization process will start on this HA device.
But this situation possible only on RAID 0 and disks without RAID. On other RAID types, StarWind most likely will not even see the error, it will be caught by the RAID itself.

Petas3 · Tue Feb 13, 2018 12:27 pm

Thank you very much for the reply.

Just to completely clarify:
*On URE another node will serve eventually it - this is happening on StarWind fabric level or client level? There will no notice of this error on the connected client except perhaps a longer delay?
*The resync process will begin automatically? - resulting in a full resync and node automatic restoration after some time? In a 3 node cluster reads for this resync process are striped to other 2 nodes, or just 1 node is used?
*Can you perhaps somehow force a partial sync - on a multiple TB volume this could have significant benefits? Or is it best-practice to have a LUN with multiple smaller (1-2TB) HA volumes over a drive/RAID?

Thanks

Wed Feb 14, 2018 2:22 pm

Petas3,

*On URE another node will serve eventually it - this is happening on StarWind fabric level or client level?

This is happening on StarWind level.

There will no notice of this error on the connected client except perhaps a longer delay?

You can it in StarWind Management Console, StarWind logs and Windows Application logs.

*The resync process will begin automatically? - resulting in a full resync and node automatic restoration after some time?

Yes.

In a 3 node cluster reads for this resync process are striped to other 2 nodes, or just 1 node is used?

The resync process will start from all the nodes that are synchronized, in this case from 2 nodes.

*Can you perhaps somehow force a partial sync - on a multiple TB volume this could have significant benefits? Or is it best-practice to have a LUN with multiple smaller (1-2TB) HA volumes over a drive/RAID?

Yes, you can change the priority of the synchronization process. You can create multiple smaller HA volumes over a drive/RAID. It depends on the type of the underlying storage and type of VM you plan to run on these HA volumes.

Petas3 · Wed Feb 14, 2018 2:53 pm

Thanks, you are the best :-]

Wed Feb 14, 2018 3:30 pm

Thanks