VSAN cluster with single network backed by multiple physical NICs

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
TBPrince
Posts: 7
Joined: Fri Feb 16, 2024 4:35 pm

Fri Feb 16, 2024 5:00 pm

Hello,

I'm pretty sure our scenario is not supported for VSAN (Free, as a start) but I'm just trying to be sure.

We have several Hyper-V hosts (WS 2022), some are running as standalone hosts and several S2D clusters. Our hosts, for technical reasons, usual have 4x10+Gbps NICs, two of them connected to what is DC public network while 2 of them connected to the private network so they are redundant. Again, for technical reasons, such NICs are usually teamed (we're aware about SET, we cannot use that). Our VMs are usually connected to what is our private network for their public traffic, again for technical reasons. So each network is backed by two redundant physical NICs connected to different uplinks. We can easily aggregate bandwidth that way but, again, we teamed - instead of using SET - because of specific restrictions.

We run several guest clusters and so far we had no real issues with this setup. We have several S2D guest clusters that basically had been running this setup for years but we also have different kind of clusters (GlusterFS, Galera etc.). When appropriate, we also run 2-nodes clusters (for ex. S2D), when we want to ensure better resiliency we run 3+ nodes clusters. So far so good. We usually use the other physical network for heartbeats, replication (for ex. Hyper-V replica), backups and so on. However, from a technical point of view, even if we usually configure 2 networks for clusterized VMs, such network mostly run onto the same physical (and teamed) network which can grow up to 50Gbps. In facts, our NIC support 50Gbps uplink though we have allocated usually 2x10Gbps.

While generally speaking we have no issues, there are a couple of scenarios where we run specific PHP applications on Windows Server and such applications - since 2-3 years - don't seem to deal with Windows Server clusters in a good way. Not sure why, because they were very fast up until 2-3 years ago while now PHP on Windows has very bad performances. However, we need to run such applications on Windows (so please no "switch to Linux" suggestions. Thanks! :)

We both tried to setup a S2D hyper-converged guest cluster for each of them or connect them (actually, the guest OS) to HA-SMB storage clusters. While clusters run fast, data transfers are very good especially when using multi-path, they exhibit different issues. In case of plain S2D hyper-converged setup, PHP seems to be unable to deal with its temp files written onto the cluster (hence, we lose sessions). When connected to HA-SMB clusters, they have very poor performances while the cluster and the guest OS are very fast.

Congratulations if you are still reading! :D Our goal would be getting rid of S2D or SMB on such guests because there is clearly some kind of issue between PHP and such network paths/disks so we are exploring the chance to use VSan that we already used in the pre-S2D era and we were very happy with that. At that time, we were using them to build a WS failover cluster for our VMs.

So my question is: I guess our scenario is absolutely not supported for VSan. Is that confirmed? Basically, we would need to create 2 logical networks in the guest OS to run sync and traffic (and heartbeats) but such networks would be backed by that aforementioned single physical network, or better by 2 teamed NICs on our hosts which provide failover in case of issues. I seem to remember, though, that VSan needs at least 2 physical networks, 1xtraffic/hb + 1sync and other scenarios are not supported at all.

Is this still the case or would our double uplink be enough to run both networks in a supported - or at least resilient - way?

Thanks a lot!

Thanks,
Andrea
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Fri Feb 16, 2024 8:03 pm

Hi,

Well, that's impressive. Thanks for posting here :D
Your setup reminds me pretty much a public cloud systems: traffic mixing (but good bandwidth) and complex network systems. We are weary about teamings because of network overheads and stability problems, yet, you can look into node Majority! You will need 3 hosts though. https://www.starwindsoftware.com/help/N ... ategy.html.
It is not quite configured according to the best practices, but Node Majority should minimise chances of split-brain.
What is the underlying storage configuration, by the way?

P.s. test it well before moving to peod, namely channel failures.
TBPrince
Posts: 7
Joined: Fri Feb 16, 2024 4:35 pm

Sat Feb 17, 2024 10:35 am

Hello Yaroslav,

yes, in facts we're a service provider of IaaS and PaaS services. :)

Thank you for your suggestion. Node Majority definitely is an option, that's basically what we're using for our other clusters.

Our host are usually configured with 2 x SSD for OS and stuff plus 6 x NVMe as storage. We only work with software-defined storage (hence S2D and VSan or storage pools when appropriate) because we need to be able to replicate our configuration in all regions that we must support so we don't use specialized devices.

By the way, I guess we can create a cluster out of 3 nodes and then connect our clients adding those 3 nodes to each client and enabling multiple path (MPIO), right as we do for our HA-SMB cluster which is external to our clients. Or, we could hyper-converge by installing VSan directly on our clients. In the latter case we would activate 3 nodes for VSan but hyper-converge just two of them since we usually have 2-nodes clients for such PHP applications.

I read that you suggest not to backup the VM at hypervisor level because of the slight suspension of the VM during backup, instead you recommend backing data up inside the guests. Did you experience any issue with Acronis agents?

Thanks,
Andrea
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Sat Feb 17, 2024 11:44 am

Myself, I had issues with Acronis CyberProtrect in clusters (constrained CSV move), but I hope they fixed it. You might need to exclude StarWind VSAN-related directories (directory where images are and the software is installed) and ports (3261 and 3260), and iSCSI and Sync Networks from scanning. Also, make cluster-aware exclusions (Acronis might know how to exclude to make their solution cluster-aware).
An important notice on node majority:it can be 3-node, witness-less replica (you might need gui-enabled key for that, do not hesitate to DM me for that, or contact support@starwind.com for tech discussions and demos 1113028 will be your reference).
Side-note, CVM allows for MDADM, and based on what I hear, MDADM is slightly more performant than Storage Spaces.
Let me know if any help needed.
TBPrince
Posts: 7
Joined: Fri Feb 16, 2024 4:35 pm

Sat Feb 17, 2024 2:34 pm

Hello Yaroslav,

thank you again for your kind support.

If we decide to go on with VSan we wouldn't have a Failover Cluter, generally speaking. Basically we would connect our client OS to these multiple instances via iSCSI. We might decide to backup data via the client, that is the attached disk and we will probably exclude disk(s) where VSan devices are stored from backup on VSan servers themselves. That would only provide backups through the iSCSI uplinks and - generally speaking - should also aggregate bandwidth from all of those 3 servers during backup. Maybe!

OR, we might decide to hyper-converge (without failover clustering: as I mentioned, we're not looking to host VMs there, rather two/three instances of the same application) and again exclude devices from backup while we keep backing up the attached disk. Not sure if, such case, VSS could be an issue even if devices are excluded.

We have a POC running now, configured via PS. However, since VSan disks are thick (we work a lot with thin-provisioned disks... our preferred way), we tested extending a single-node device - no issues. However, what is the right procedure to extend a device when it is in partnership via PS? Can we just extend one of the device on one of the servers and it gets automatically extended on its partner?

failover:1 for node-majority?

Thanks,
Andrea
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Sat Feb 17, 2024 5:19 pm

You are always welcome. :)
Yes, 1 is for node majority.
I'd suggest trialing the product too to see UI capabilities independently.
You are right, HAs grow on both nodes once you grow it on one side.
You can use HA devices for storage, yet, make sure MPIO is handled on the client side. Also, you might need clustering for hyperconverged setups. Unlike file-level protocols used in NAS, iSCSI is a block-level protocol and it cannot arbiter read-write access to an iSCSI device connected to multiple servers. In order to provide the access to one device from multiple servers, the device needs to have a clustered file system. While VMFS is a clustered file system and no additional actions should be done to see updated data on datastore, iSCSI storage, connected to the nodes in a Microsoft failover cluster should be managed by the cluster and formatted as CSVFS if used as Cluster Shared Volume (CSV). Such an approach allows to share single storage device between the nodes in the cluster and get updated data on them.
You don't need sharing complexities only if you align with principle: one HA - one client (e.g., host). Then NTFS should get ok.
TBPrince
Posts: 7
Joined: Fri Feb 16, 2024 4:35 pm

Sat Feb 17, 2024 8:41 pm

Hello,

that's an interesting remark. I thought we could prepare a converged setup instead of an hyper-converged. But in any case, even to expose as SMB share, we would need a failover cluster, it seems.

Or we should connect each those two clients to a different server, so client1 -> target1 and client2 -> target2 while target3 stays free and is only replicating. That would mean, though, that if target1 crashes, client1 will disconnect from storage. Plus, probably target1 and target2..3 have no global locking so clients could corrupt files if accessing the same devices/files.

Any other way to expose HA devices other than creating a failover cluster? I can read there's a script for a HA SMB witness but I'm not sure that could be used for a HA SMB share without the failover cluter.

Thanks,
Andrea
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Sat Feb 17, 2024 9:59 pm

Greetings,

In a nutshell, you need something that helps with locking up the file system to a client or makes all clients aware of each other. It can be a file share role that uses StarWind HA volume, or cluster file system (e.g., CSVFS).
If the volume is going to be exposed to one share, the share will handle the ownership itself (unless it is just a VM).
Or we should connect each those two clients to a different server, so client1 -> target1 and client2 -> target2 while target3 stays free and is only replicating. That would mean, though, that if target1 crashes, client1 will disconnect from storage. Plus, probably target1 and target2..3 have no global locking so clients could corrupt files if accessing the same devices/files.
You need to expose one mirror to one client. Say, target 1 and its replica serve only client 1. Target 2 and its replica are only for client 2. Such approach excludes VMs on client 1 moving to client 2.
A simple test is writing files to a target from clients 1 and 2 and trying to browse for files of 1 on 2 and vice versa.
Any other way to expose HA devices other than creating a failover cluster? I can read there's a script for a HA SMB witness but I'm not sure that could be used for a HA SMB share without the failover cluter.
You can do shares.
TBPrince
Posts: 7
Joined: Fri Feb 16, 2024 4:35 pm

Wed Feb 21, 2024 9:13 am

Hello Yaroslav and thank you for your support.
You can do shares.
So while that script, included with SW, is only showing how to create a small (12MB or so, I think) witness share, is that setup suitable to create a bigger share, say 1TB or so ? Would that share be highly-available and fully SMB 3.1.x compatible, with all the bells and whistles ?

Os is that recommended for small SMB shares only, like witness and so forth?

I'm asking because, as I said, we believe that in our scenario clustering and/or SMB might be the issue we're experiencing, given that we're not aiming at using that for VMs but for a specific guest-installed application.

Thanks,
Andrea
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Wed Feb 21, 2024 10:48 am

Hi,

You are always welcome.
That is a sample script. You can create a larger device using the script. But, given the fact that full synchronization is running after the device creation, I'd suggest creating smaller devices and growing them afterward. The witness on SMB share will take 4 KB anyway.
From my personal experience, there is no limitation for SMB shares.
clustering and/or SMB might be the issue we're experiencing
Can I please hear more about the issue?

Also, if you are having technical difficulties, please contact support@starwind.com and consider trialing the product (use 1115373 as your reference). During the trial phase we will assist you from the technical side.
TBPrince
Posts: 7
Joined: Fri Feb 16, 2024 4:35 pm

Thu Feb 22, 2024 4:58 pm

yaroslav (staff) wrote:
Wed Feb 21, 2024 10:48 am
Hi,

You are always welcome.
That is a sample script. You can create a larger device using the script. But, given the fact that full synchronization is running after the device creation, I'd suggest creating smaller devices and growing them afterward. The witness on SMB share will take 4 KB anyway.
From my personal experience, there is no limitation for SMB shares.
Thanks. Are these shares highly-available? Do they support SMB 3.1.1?
Can I please hear more about the issue?
Basically, performances from the cluster are fine but PHP performances accessing those paths/disks are poor. Not clustering fault, according to what I hear. Windows build of PHP lost a lot of performances lately, I heard that this might be an issue with those builds with Windows ASLR but might be something else. We had a lot of PHP apps using SMB (for ex. in IIS) which, all of sudden, simply switched from great performances to very poor performances. I don't believe that has anything to do with locking, as someone says, because we also have those issues when reading and those cluster can read at good speed (4+Gbps, for example the latest we had issues with).

In this case we would like to have a setup where we're not forced to use Windows Server failover clustering to check if we can improve those issues. If they are highly-available, we can give those SW SMB shares a try in order to check if that improves performances. If so, we might decide to adopt that setup as a standard one for such scenarios.

Thanks.
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Thu Feb 22, 2024 5:52 pm

You are welcome!
Thanks. Are these shares highly-available? Do they support SMB 3.1.1?
I see there is a confusion. I was referring to the StarWind VSAN file share witness (available only for StarWind VSAN Windows-based app). If you go Windows-based way, you can repurpose HA device for the file share later. We have no restrictions for supported SMB version, to my knowledge. This relates to node majority (https://www.starwindsoftware.com/help/C ... tness.html).
You can create a File share in Starwind VSAN Web UI (CVM-based deployment). But it is not highly available. I need to confirm SMB 3.1.1 support though
Let me know which way you prefer.
In this case we would like to have a setup where we're not forced to use Windows Server failover clustering
You do not need failover clustering if you follow the principle of one HA-one client (e.g., VM or host).
TBPrince
Posts: 7
Joined: Fri Feb 16, 2024 4:35 pm

Fri Mar 01, 2024 2:00 pm

Hello Yaroslav,

and as usual thank you for your kind support.
yaroslav (staff) wrote:
Thu Feb 22, 2024 5:52 pm
I see there is a confusion. I was referring to the StarWind VSAN file share witness (available only for StarWind VSAN Windows-based app). If you go Windows-based way, you can repurpose HA device for the file share later. We have no restrictions for supported SMB version, to my knowledge. This relates to node majority (https://www.starwindsoftware.com/help/C ... tness.html).
Yep, I was referring to this one. The witness share for the Windows Server edition of the software. There's a Powershell scripts for that and I was trying to understand if such share is highly available once created. The reason for this request is that we could keep storage AND applications separated (converged) while exposing the storage as the SMB share (not for witness, of course, a big one like 1TB or so), backed by two or three StarWind machines. But if that share, as created by SW (not the volume, the share) is not highly available then I think that can't help.

We prefer not to go with CVM-based deployment.
You do not need failover clustering if you follow the principle of one HA-one client (e.g., VM or host).
But in such case, that is:

Code: Select all

server1 ->   target1
                            |sync|
server2 ->   target2
if target1 crashes, server1 will crash, is that right? So I guess that if we want highly available storage we need to use the Windows Failover Clustering filesystem and expose those targets as ISCSI targets. If not, if go with one client one storage, if the target crashes the client will (probably) crash or at least it won't be able to access the storage.

Thanks,
Guglielmo Mengora
yaroslav (staff)
Staff
Posts: 2361
Joined: Mon Nov 18, 2019 11:11 am

Fri Mar 01, 2024 3:11 pm

Hi Guglielmo,

I am always glad to help.
But if that share, as created by SW (not the volume, the share) is not highly available then I think that can't help.
Building a converged setup where the storage is just presented to the host over iSCSI and that host does SMB sharing could help. Yet, I'd suggest contacting support@starwind.com for more detailed POC environment discussion and building.
if target1 crashes, server1 will crash, is that right?
It all depends on whether the target is HA or not. For HA devices, it will read/write to the replica. For non-HA (e.g., file share), you are 100% right.
So I guess that if we want highly available storage we need to use the Windows Failover Clustering filesystem and expose those targets as ISCSI targets. If not, if go with one client one storage, if the target crashes the client will (probably) crash or at least it won't be able to access the storage.
You can expose HAs (all mirrors) to one client without need for clustering. MPIO will mask those different paths as one and do the magic for you so that NTFS will not get upset.
If you connect the HAs (all mirrors) to several clients, NTFS will get upset.
The same applies to mirrors connected to different partners. Say, if an HA device has a replication partner, and that HA is connected to SERVER1 and its partner is connected to SERVER2 it still imposes the risk of corruption.

I hope I did not confuse you.
Have a great weekend!
Post Reply