query on replication for clusters

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
kktwenty
Posts: 17
Joined: Fri Mar 20, 2020 8:33 am

Fri Mar 20, 2020 3:22 pm

I currently have a 4 node stretch cluster with two "share nothing" physical DAS SAN in hyper-v 2016 (2 nodes and 1 SAN on each "site"). Whilst this does work I am unhappy with write performance async or sync. Infrastructure is 10G between "sites" with a pair of nodes in each site. Performance is much lower than expected when replication is active.

SANS are MD3220 and MD3200 with raid1 SSD for logs and raid 10 900GB 10k SAS as data. 400Gb available as LOG and 4Tb available as DATA . Connection is DAS via HBA SAS to each pair of nodes (iSCSI is not available on the SAN). Nodes are dell R630 dual 8c with 256Gb RAM. 10Gb interconnects with redundant 10G and 1G fibre heartbeats. Physical Quorum in a third building. Since this is a stretch cluster, primary site SAN can die and secondary SAN takes over redirected. Primary nodes can fail and secondary nodes take over etc. VM Workload is able to be run on a single node if necessary but is usually spread 50/50 over the two primary site nodes. Secondary site is there purely as high availability.

Can I replicate the above with starwind as a vSAN layer on top of the SAN? Basically I am looking to replace microsoft storage replication with starwind (v)SAN replication on the grounds that starwind will be faster. However, since this is a stretch cluster I was unsure if starwind works this way as a "share nothing" approach.

This is my "thinking": I could present both SAN to starwind - a pair of nodes connected to primary SAN via HBA SAS and a single node connected to a secondary SAN via HBA SAS - (8Tb + 800Gb total capacity spread over 2x SANs) then have all 3 nodes see a vSAN 4Tb with Starwind looking after the SAN replication (and using SSD as cache etc). Of course, 2 of the nodes will see the primary SAN and the 3rd node (4th when I spend money!) will see the secondary SAN. If the primary SAN dies (or both primary nodes that are connected to the primary SAN), starwind would still have the secondary SAN with all the data. The vSAN would be presented as a cluster share drive to all 3 nodes in the cluster.

Is this possible?
yaroslav (staff)
Staff
Posts: 2277
Joined: Mon Nov 18, 2019 11:11 am

Mon Mar 23, 2020 8:41 am

Greetings,

Thank you for sharing the configuration. I have some questions and things to point out first.

1)
2 nodes and 1 SAN on each "site"
I am sorry to say that but that single storage box is a single point of failure. Should it go down, production on one site is stopped.
2)
Infrastructure is 10G between "sites" with a pair of nodes in each site
How many physical connections (i.e., wires) are going from one site to another. If there is only one cable, it is a single point of failure.
3)
Primary nodes can fail and secondary nodes take over etc
and
M Workload is able to be run on a single node if necessary but is usually spread 50/50 over the two primary site nodes.
Just to make sure that we are on the same page, is it active/active replication or active/passive?
4
Physical Quorum in a third building.
Effectively, this is a 3-node converged setup with a dedicated witness node.
5
Since this is a stretch cluster, primary site SAN can die and secondary SAN takes over redirected
That depends on many factors but yes it can be tuned so.
6
Secondary site is there purely as high availability.
Unsure what you mean here. You have only 1 storage box in each site + witness (which actually does not carry any data). If data is replicated between those 2 boxes, data is highly available.

Now, let's jump to your question
Can I replicate the above with starwind as a vSAN layer on top of the SAN
.
Yes, it is possible, but above there are things which I would really like to know before we move on.
(v)SAN replication on the grounds that starwind will be faster.
There are a bunch of things that affect performance. It is a converged (i.e., compute and storage separated) stretched cluster so networking may be the pain.
However, since this is a stretch cluster I was unsure if starwind works this way as a "share nothing" approach
StarWind VSAN is fit for stretched clusters. But I really want to discuss the setup.
If the primary SAN dies (or both primary nodes that are connected to the primary SAN), starwind would still have the secondary SAN with all the data
Yes, you are right. If one storage server dies, VMs on site 1 can still access data from site 2 (might be slow, but it works).

In general, the setup you described here will work (3 node mirror is a good idea). I would recommend you schedule a tech call with one of the guys in PreSales to discuss all the details (it will be much faster as if we chat here :) ).
kktwenty
Posts: 17
Joined: Fri Mar 20, 2020 8:33 am

Tue Mar 24, 2020 1:27 pm

this is a single stretch cluster with storage replication, it is not a cluster to cluster - I set it up identically to this layout https://docs.microsoft.com/en-us/window ... ed-storage. There are 3 cables between the two sites, 2 independent 10gig cables and a single 1 gig fibre - this was in place before I stretched the cluster as we kept dual backups, I created the stretch cluster on account that the SAN was a single point of failure. Currently as things stand I can lose one of the SANs and the cluster remains workable with 4 nodes (albeit two will have redirected storage). I can lose a whole site and the remaining SAN and 2 nodes will work. I intend to remove the physical quorum and move to online quorum. The issue is speed, I am not happy with the speed of the writes, they fall below what each site can perform independently. I have rules out physical links by removing replication and running the system plus tests from the nodes in the second site running redirected storage - speed increased to expected levels. This leaves microsoft implementation of data replication between the two sites as being a bit tardy.

My question is:

I wish to continue running a stretched cluster, i.e. one cluster, 2 nodes in site A and 2 nodes in side B. I will have a SAN in site A shared via DAS to node1 node2. I will have a SAN in site B shared via DAS to node 3 and node 4. I wish to leverage starwind to act as vSAN for nodes 1,2,3,4 and will be presented with 2x SANS. I would like starwind to create the vSAN with high availability in mind i.e. I could lose either SAN1 or SAN2 and still be in business.

I assume:

I will install starwind on nodes 1,2,3,4 nodes 1,2 will see SAN1 and nodes 3,4 will see SAN2. Starwind will let me create a vSAN and subsequent LUN(s) which will be iSCSI presented to nodes 1,2,3,4 for use as CSV. The vSAN will be high availability so I can lose SAN1 or SAN2 (or links, or both nodes connected to the relevant SAN)


I am aware that the free version is 3 nodes only therefore if I was to want to run this in production I would have to lose node4 in the setup. Have I assumed correctly or am I misunderstanding what starwind can do?

Note: when I say "site" these are buildings , "site" is simply how microsoft define each side of the stretch cluster. I am using 10G copper at around 60m to give you an idea of distance, the system easily qualifies for synchronous latency.
yaroslav (staff)
Staff
Posts: 2277
Joined: Mon Nov 18, 2019 11:11 am

Tue Mar 24, 2020 5:13 pm

Thank you for the update.

Your judgments regarding the availability of the resulting setup are right: Yes, you can lose1 SAN and production will remain operational but a bit slower. It looks to me that production is running on both sites at a time.
There are 3 cables between the two sites, 2 independent 10gig cables and a single 1 gig fibre
That is pretty much redundant, thank you for clarification. How many physical network cards do you have in the servers (check StarWind VSAN requirements https://www.starwindsoftware.com/system-requirements)?
starwind on nodes 1,2,3,4 nodes
Just to make sure that we are on the same page SAN means the storage box while node means the compute server, am I right? If so,
StarWind is to be installed on the storage side https://www.starwindsoftware.com/resour ... rver-2016/. My question is, is all storage in SAN 1 and SAN 2 or SANs are just additional storage? If SANs carry the storage and nodes are for compute resources, the resulting setup will be a 2-node StarWind setup (still can go with a Free license as VSAN is installed on SANs).
This leaves microsoft implementation of data replication between the two sites as being a bit tardy.
StarWind uses dedicated channels to present the storage to compute servers and synchronization. Having these types of traffic flowing over different wires makes such a setup better performance-wise. It is also possible to tweak the networking and StarWind VSAN to ensure the best possible performance. Speaking of performance, here is how we measure it https://www.starwindsoftware.com/best-p ... practices/.

Please, take a look at this document https://www.starwindsoftware.com/resour ... rver-2016/. I still recommend you to put one of StarWind engineers on the call to discuss the resulting setup.
kktwenty
Posts: 17
Joined: Fri Mar 20, 2020 8:33 am

Wed Mar 25, 2020 8:35 am

Each SAN is a physical box of drives - SAN1 is a Dell Powervault MD3220, SAN2 is a Dell Powervault MD3200. Each SAN has a mix of SSD and HDD (10k/15k) and connect using DAS SAS - iscsi is NOT available on either SAN. Each node is a physical Dell R630 server. Each node has a HBA SAS card and connects to a SAN using 2x 6Gb SAS - i.e. node 1 and node 2 connect to SAN1 with 2x 6Gb each, node 3 and node 4 connect to SAN2 with 2x 6Gb each. Each node has 2x physical network cards, one card has 2x 10Gb NIC, one card has 4x1gb . Switches are Netgear XS716T full 10Gb copper. 1Gb switches are Netgear S3300 (10Gb interconnect 1Gb client).

Current Node setup is 1x 10Gb network, 1x 10Gb cluster traffic only (replication), 1x 1Gb heartbeat, 1x 1gb management (2 spare 1gb). The cluster is able to use the "network" 10Gb as a low priority (for failover) and also the heartbeat 1gb as a very low priority. I have previously run liveoptics over a typical week to get IOPS and bandwidth peaks/sustained levels.

My plan was to install starwind in "storage and compute" configuration on each node, hence 4 node solution, this would be for maximum redundancy. It is possible I can break my cluster into 1xcompute node and 1xstorage node per "site" but this would not be optimal for me as (in my mind) this would lower my current high availablity setup. I understand that the free version supports 3 nodes. I was not sure how starwind would work with two nodes seeing the same storage box, I assumed starwind would see this as MPIO.

Since I am looking to test this, would it be possible to do the following:

Leave my current setup running, install starwind as "storage and compute", present another LUN to nodes 1 and 2 via SAN1 and have starwind ONLY use this LUN for its use. present another LUN to nodes 3 and 4 via SAN2 and have starwind ONLY use this LUN for its use. That way the current working system can run and I will be able to create a starwind vSAN for a migration path?
yaroslav (staff)
Staff
Posts: 2277
Joined: Mon Nov 18, 2019 11:11 am

Wed Mar 25, 2020 11:01 am

Thank you for adding additional details.
Each SAN has a mix of SSD and HDD (10k/15k)
Any automatic storage tiering planned?
Each node has 2x physical network cards, one card has 2x 10Gb NIC, one card has 4x1gb
That fully satisfies StarWind VSAN requirements.
How many cards do SANs have?
Switches are Netgear XS716T full 10Gb copper. 1Gb switches are Netgear S3300 (10Gb interconnect 1Gb client).
If iSCSI is going through 1 10GB switch, switches can hardly be considered redundant.
Are SANs connected directly? In other words, is at least 1 of those wires between buildings going straight from the back of one SAN to the other?
A direct link for Sync will be awesome.
was not sure how starwind would work with two nodes seeing the same storage box, I assumed starwind would see this as MPIO.
That works. You can set install MPIO and nodes will access storage of both local SAN and one in the other building without any issues (but latency). But, I think we can tune the setup and see if we get the performance you expect.
It is possible I can break my cluster into 1xcompute node and 1xstorage node per "site"
There is no need. 1 storage+2xCompute per site looks still fine (that aligns with the diagram you shared with me here before).

Now, let's go back to your setup. Thing is that 4-way mirror (if you go with 4-node hyperconverged setup) is an overkill I guess. Should something bad happen to one site, you loose storage and 2xcompute hosts anyway. Please consider installing StarWind VSAN on SANs. Data redundancy is less, but StarWind is for storage anyway. By installing VSAN on SANs you can go with Free license for a while and go commercial once you feel ready or require support.
On top of that, nodes hardly satisfy requirements for a 4 node hyperconverged setup. You still can connect nodes as grid (https://www.starwindsoftware.com/grid-architecture-page and https://www.starwindsoftware.com/resour ... hitecture/), but that is hardly possible with the free license in your case.
That is why installing StarWind on SANs looks the best option for me. And, that is the setup you plan to use, I guess.
Leave my current setup running, install starwind as "storage and compute", present another LUN to nodes 1 and 2 via SAN1 and have starwind ONLY use this LUN for its use. present another LUN to nodes 3 and 4 via SAN2 and have starwind ONLY use this LUN for its use. That way the current working system can run and I will be able to create a starwind vSAN for a migration path?
kktwenty
Posts: 17
Joined: Fri Mar 20, 2020 8:33 am

Wed Mar 25, 2020 12:47 pm

I think (in the short term) I will cold store "node 4", install "free starwind" storage and compute on nodes 1,2,3. SAN1 will be presented to node 1,2 with one LUN created just for starwind. Node 3 will have SAN2 presented with identical size LUN created just for starwind. I will create a starwind vSAN, add this to the existing cluster as a CSV, move VM storage from current physical CSV to starwind (vSAN)CSV.

Regarding switches, yes, our 10G switches are the weak link from a HA point of view, however, if the 10G switch(es) goes down I have a separate S3300 switch(es) that looks after our 1G fibre link and connects to 1G heartbeat. The cluster (as it stands) is allowed to use this for cluster traffic but at the lowest priority. No network traffic can run over the 1G link so we would lose client connectivity but not data connectivity. The Physical links are vLAN'd accordingly of course.

You are quite correct, this is an overkill setup - a 3node compute/storage setup will be sufficient for HA, my initial goal was simply to mitigate storage failure and I could always add more "cheap storage" to one of the 3 nodes as I have read starwind will prioritize in the case of asymmetrical storage speed.

all questions answered, thank you. I am fully remote working at the moment so I will probably start this in a few weeks time when I can return onsite (just in case).
yaroslav (staff)
Staff
Posts: 2277
Joined: Mon Nov 18, 2019 11:11 am

Wed Mar 25, 2020 1:42 pm

Yes, the setup you outlined looks fine to me.
The thing I do not quite get is why using physical shared storage alongside a virtual SAN. 2-node compute and storage separated config with storage replicated between 2 looks good. Well, you are the admin there and you know your needs and environment better than whoever else. It was just my thoughts.
I have read starwind will prioritize in the case of asymmetrical storage speed.
Absolutely! ALUA is a gamechanger.

Happy to know that I helped you. Hope to hear you on a tech call after quarantine [finally] ends!
Take care, stay healthy.
kktwenty
Posts: 17
Joined: Fri Mar 20, 2020 8:33 am

Thu Mar 26, 2020 1:43 pm

The thing I do not quite get is why using physical shared storage alongside a virtual SAN.


Since this is a running system I will need to phase the transition:

A) system uses Microsoft Storage Direct Stretch Cluster with replication.
B) create a vSAN and present as new CSV
c) move storage of VMs to new CSV
d) remove Microsoft Storage Direct replication
e) delete LUN previously used by Microsoft.

These are DAS SAN not iSCSI so I cannot use native Dell SAN hardware/physical replication with failover. I require HA storage so can use MS inbuilt system of replication between 2 "sites". I am not happy with the speed of the MS implementation so am looking at starwind vSAN as the storage layer. Starwind will take care of the storage replication and present a HA LUN to the cluster - the cluster wont need to worry about storage HA.

Now that Starwind will do what I want, ive looked through the documentation. I see the workflow as follows:

1) Before installation of Starwind Server, ensure that my physical SAN are presenting a LUN to NODES to be used by Starwind.
2) Install Starwind
3) set default storage pool as the SAN LUN presented to NODE (from step 1)
4) create virtual disk in the storage pool (which lives on physical SAN LUN)
5) enable replication with other starwind server(s) to get HA
6) add iscsi target for virtual disk
7) sit back knowing that the virtual disk is HA replication between the starwind servers
8) <hyper-v nodes will now attach the iscsi starwind target, add to CSV and migrate VMs as appropriate>
yaroslav (staff)
Staff
Posts: 2277
Joined: Mon Nov 18, 2019 11:11 am

Fri Mar 27, 2020 11:15 am

These are DAS SAN not iSCSI
Well, that explains a lot!

Yes, a great plan! Let me know if you need any help.
Post Reply