Page 1 of 2

ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Mon May 23, 2011 5:57 pm
by mkaishar
We have an interesting issue and a support call has been opened with VMware, but I would like some input from Starwind also.

We run 3 ESXi 4.1 Enterprise servers with 4 1gbe for SAN, using vmware's iscsi software initiator
Starwind on W2K8R2 on Dell hardware with 2 10gbe for SAN (each nic has 2 ip addresses)
4 networks for SAN (10.0.0.0/24, 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24)
MPIO with round robin on vmware enabled

This is my rr script in rc.local

Code: Select all

esxcli nmp device list | grep ^eui |
while read device ; do
        esxcli nmp device setpolicy --psp VMW_PSP_RR --device ${device}
        esxcli nmp roundrobin setconfig --type "iops" --iops 3 --device ${device}
done
In vmware under dynamic discovery I put the 4 IP addresses of the Starwind storage server (10.0.0.220, 10.0.1.220, 10.0.2.220, 10.0.3.220)

Do a rescan all and view the messages log on the vmware server and I notice that each vmware vmkernel is trying to connect to each target whether it is part of that network or not.

Now this takes time a long time in the 30 minutes + depending on the number of targets and the number of vmkernels.

Wondering if other Starwind/vmware customers encountered the same issues?

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Sun May 29, 2011 3:12 pm
by Constantin (staff)
It`s OK, that VMKernel tries to reach all available IP addresses. For higher redundancy I would rather recommend you following:
a) run all 4 VMKernel and StarWind NICs in one subnet
b) Manually assign VMNICs to SW iSCSI HBA using ESX(i) CLI.

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Sun May 29, 2011 4:54 pm
by mkaishar
Constantin (staff) wrote:It`s OK, that VMKernel tries to reach all available IP addresses. For higher redundancy I would rather recommend you following:
a) run all 4 VMKernel and StarWind NICs in one subnet
b) Manually assign VMNICs to SW iSCSI HBA using ESX(i) CLI.
I am already assigning vmnics to swiscsi using the cli...but your other recommendation of running all vmkernels and starwind in one subnet completely contradicts starwind's documentation about setting up mpio and that is the only reason we do that.

Has something changed with starwind's ability to work with mpio on a single subnet? if i only need one subnet I can easily modify the entire structure, but i need exact confirmation from starwind that running 1 subnet for our iscsi and mpio would still work.

Here is starwind's basic how-to on mpio http://www.starwindsoftware.com/images/ ... d_MPIO.pdf

Please advise...

Thanks,
Mark

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Sun May 29, 2011 5:39 pm
by Constantin (staff)
We`ll update soon our best practices in documentation. This recommendations are based on the recommendations of VMware technical papers.

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Sun May 29, 2011 6:20 pm
by mkaishar
Constantin (staff) wrote:We`ll update soon our best practices in documentation. This recommendations are based on my personal experience received during VMware Ready testing.
again...I am not getting any specific responses from starwind, just generalizations that really still keep the customer wondering, since we are a starwind customer, why not just tell me specifically if i need one or multiple subnets for mpio to work?

i can tell you for a fact that equallogic requires only a single subnet for mpio to work, but I cannot tell you that starwind only requires a single subnet because the documentation states multiple subnets :)

let me explain further, we use vmware esxi to mount datastores but we also have vm guests mounting iscsi volumes through microsoft iscsi initiator and connecting directly to starwind.


thanks,
mark

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Mon May 30, 2011 4:31 am
by Constantin (staff)
Currently I`m out of office so can`t go to our tech department responsible for tech papers and ask them to update to papers, and why.
Also in your case I don`t see any problems too: all VMKernel and VMs too can be in one subnet, it`s not a problem, but you simply should ACL in StarWind for LUN masking - thus you`ll hide datastore targets from VMs and VMs disks from ESX(i).

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Mon May 30, 2011 5:29 pm
by mkaishar
Constantin (staff) wrote:Currently I`m out of office so can`t go to our tech department responsible for tech papers and ask them to update to papers, and why.
Also in your case I don`t see any problems too: all VMKernel and VMs too can be in one subnet, it`s not a problem, but you simply should ACL in StarWind for LUN masking - thus you`ll hide datastore targets from VMs and VMs disks from ESX(i).
Hello Constantin,

We use ACL in starwind to allow only certain initiators access to certain targets, but...
ACL in starwind do not work to prevent vmware vmkernels from scanning targets that are on different iscsi networks.
ACL in starwind do work to prevent vmware vmkernels from accessing targets that are restricted.
ACL in starwind do work to prevent other initiators from acceessing targets that are restricted.

I put in a support ticket with vmware wondering why it takes 20-30 minutes for our vmware servers to start and I included a little overview of our iscsi setup

We have 4 vmkernels and 4 subnets for iscsi, I told them that during startup the logs show that the vmkernels are accessing targets in their respective subnet, but also trying to access targets in the other subnets and failing and this is what is causing host server startup delay.

Code: Select all

- vmk1/10.0.0.18 connects to 10.0.0.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) successfully
- vmk1/10.0.0.18 tries to connect to 10.0.1.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk1/10.0.0.18 tries to connect to 10.0.2.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk1/10.0.0.18 tries to connect to 10.0.3.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails

- vmk2/10.0.1.18 connects to 10.0.1.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) successfully
- vmk2/10.0.1.18 tries to connect to 10.0.0.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk2/10.0.1.18 tries to connect to 10.0.2.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk2/10.0.1.18 tries to connect to 10.0.3.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails

- vmk3/10.0.2.18 connects to 10.0.2.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) successfully
- vmk3/10.0.2.18 tries to connect to 10.0.0.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk3/10.0.2.18 tries to connect to 10.0.1.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk3/10.0.2.18 tries to connect to 10.0.3.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails

- vmk4/10.0.3.18 connects to 10.0.3.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) successfully
- vmk4/10.0.3.18 tries to connect to 10.0.0.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk4/10.0.3.18 tries to connect to 10.0.1.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
- vmk4/10.0.3.18 tries to connect to 10.0.2.220 (vmvol00iso, vmvol01, vmvol02, vmvol03, vmvol04, vmvol05, vmvol06, vmvol07) and fails
So vmware support came back and told me that this is standard when using vm software iscsi inititator and provided the KB below:
http://kb.vmware.com/selfservice/micros ... Id=1024476

So...if we can run mpio with vmware and mpio with vm guests using ms iscsi Initiator with 1 iscsi subnet instead of 4, then I can scale back our network.

When you have time let me know if this is feasible, if not, then we will have to live with the current limitations of starwind requiring multiple subnets for mpio to work and vmware's software iscsi initiator limitations of scanning multiple networks and timing out.

Thanks,
Mark

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Mon May 30, 2011 6:03 pm
by Constantin (staff)
ACL in starwind do not work to prevent vmware vmkernels from scanning targets that are on different iscsi networks.
ACL in starwind do work to prevent vmware vmkernels from accessing targets that are restricted.
ACL in starwind do work to prevent other initiators from acceessing targets that are restricted.
You can prevent it by using StaticDiscovery instead of DynamicDiscovery. The reason of it - ACLs in StarWind are working on server, but not client side that`s why ACLs aren`t able to prevent rescaning from different subnet.

You can send an email to support to get an answer that we FULLY support configuration with 1 subnet, and even recommend it.

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Mon May 30, 2011 6:16 pm
by mkaishar
Constantin (staff) wrote:You can prevent it by using StaticDiscovery instead of DynamicDiscovery. The reason of it - ACLs in StarWind are working on server, but not client side that`s why ACLs aren`t able to prevent rescaning from different subnet.

You can send an email to support to get an answer that we FULLY support configuration with 1 subnet, and even recommend it.
I tested with static discovery, same results, but I will test today using 1 subnet and provide feedback.

Thanks,
Mark

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Mon May 30, 2011 8:08 pm
by mkaishar
It works with a single subnet as you recommended...why is this information not posted anywhere?

Starwind needs a real knowledgebase system that is constantly updated. Your techpapers/whitepapers are archaic!

BTW...thank you your recommendation did resolve my problem.

Thanks,
Mark

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Tue May 31, 2011 3:51 pm
by Constantin (staff)
Anytime, we are hard working on new KB, and if you will have any problems with VMware and StarWind - contact me, my special skill is VMWare. :)

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Wed Jun 01, 2011 3:04 pm
by mkaishar
Wanted to give you a heads up...while MPIO works with Starwind and VMware in a single subnet, MPIO does NOT work with OS iSCSI Initiators like MS Windows or Linux, although the MPIO config does do RR, data only travels over the primary NIC.

So we still need multiple subnets to perform MPIO from within operating system initiators.

Thanks,
Mark

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Wed Jun 01, 2011 11:17 pm
by kmax
mkaishar wrote:Wanted to give you a heads up...while MPIO works with Starwind and VMware in a single subnet, MPIO does NOT work with OS iSCSI Initiators like MS Windows or Linux, although the MPIO config does do RR, data only travels over the primary NIC.So we still need multiple subnets to perform MPIO from within operating system initiators.
Can you expand on this in regards to Windows iSCSI initiator support and round-robin?

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Wed Jun 01, 2011 11:25 pm
by mkaishar
kmax wrote:
mkaishar wrote:Wanted to give you a heads up...while MPIO works with Starwind and VMware in a single subnet, MPIO does NOT work with OS iSCSI Initiators like MS Windows or Linux, although the MPIO config does do RR, data only travels over the primary NIC.So we still need multiple subnets to perform MPIO from within operating system initiators.
Can you expand on this in regards to Windows iSCSI initiator support and round-robin?
The load balance policy when mounting iscsi targets within the OS, is that what you are asking?

rr.png
rr.png (15.18 KiB) Viewed 19829 times

Re: ESXi 4.1u1, MPIO & Starwind slow startup/storage rescan

Posted: Thu Jun 02, 2011 8:03 pm
by anton (staff)
This is very valuable information! Thank you very much for your investigation! Should help to save quite a time to both support staff and StarWind customers.
mkaishar wrote:Wanted to give you a heads up...while MPIO works with Starwind and VMware in a single subnet, MPIO does NOT work with OS iSCSI Initiators like MS Windows or Linux, although the MPIO config does do RR, data only travels over the primary NIC.

So we still need multiple subnets to perform MPIO from within operating system initiators.

Thanks,
Mark