VSAN for VSphere Installation Issues

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

User avatar
DarkDiamond
Posts: 29
Joined: Sat Dec 31, 2011 5:57 pm

Wed Nov 05, 2014 6:36 pm

Great :)

I didn't get a chance to add this information until now but last night I verified that if I passthrough the RAID adapter to the SAN VM and present LUNS back to the host, everything works great. Prior to that, I tried removing Starwind and installing the Microsoft iSCSI target and I saw file copies freeze in the same fashion as Starwind. I figured that must be due to an LSI/PVSCSI driver issue or contention. Either are beyond my ability to troubleshoot.

The advantage I found about passing the raid adapter through to the VM is now I can use the Megaraid Storage Manager application to manage and perform maintenance and monitoring of the array. In addition performance may be better as I'm serving LUNS directly on the file system instead of adding a layer of VMFS, than NTFS in order to present VMFS to ESXi.

Hope this helps. Maybe this will give another configuration option for Starwind...

Thanks,
Chris
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Nov 05, 2014 7:10 pm

Technically we can do it but we prefer to have files on NTFS. There's no drawback on performance from that point of view. Also we do add value (log-structured I/O) and we cannot use it on a RAW storage (at least for now). Are you using LSFS or FLAT volumes so far?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
DarkDiamond
Posts: 29
Joined: Sat Dec 31, 2011 5:57 pm

Wed Nov 05, 2014 7:28 pm

All of my volumes are flat. Even though I'm doing a PCI passthrough of the raid controller to the VM, Starwind VSAN is still are serving iSCSI luns off of an NTFS volume. When I pass through the raid controller to the SAN VM, the VM thinks it's the actual LSI 9260-4i controller connected to it. The raid controller exposes the volume to the VM and I format it as NTFS. I'm not passing through disks as RDMs to the VM.

The one drawback to my approach is I do have to store my Starwind VM on a separate drive from the array since the array volume itself is never presented to the host to build VMs on. That does introduce a point of failure, although it can be alleviated by an additional raid adapter.
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Nov 18, 2014 9:28 am

As long as you "mirror" the data between hosts no fault tolerance @ host level (RAID controller is SPoF or whole storage pool is not protected) is not a big deal :) So I see no issues with your config so far...
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
fstd
Posts: 6
Joined: Mon Nov 09, 2015 10:08 pm

Mon Nov 09, 2015 10:21 pm

Hi,

same issues here,

tried to use Starwind (latest build) on Esxi deployed as VSA but under load ISCSI LUN become frozen and system disk error 153 and 129 are showed on VSA event viewer, physical server raid controller doesn't record any error and lun hosting the vsa is working perfectly.


tried different configurations with different physical controllers ( Serveraid M5015, LSI 2008) and virtual one (LSI,VMware paravirtual) but the problem persists.

tried different operating systems for VSA with no luck

tried to update esxi/raid controller drivers/vm version with no results

only passing through the phisical controller to VM seems to solve the issue.

is the anything we can do to use Starwind as VSA under ESXI without passing through the controller?

kind regards

Alberto
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Mon Nov 09, 2015 10:30 pm

The manual way is sometimes the best way :) Did the job multiple times for me so far.

Just create a new VM and install Windows inside it. Then download the latest StarWind build from the website and install it into newly created VM. Provision some physical storage as VMDK into VM and configure it for StarWind devices. Do the same on the other hosts (or just clone the current one with minor reconfigs) and you are done.
fstd
Posts: 6
Joined: Mon Nov 09, 2015 10:08 pm

Tue Nov 10, 2015 2:23 pm

Unfortunately even reinstalling windos from scratch didn't work for me still seeing "Management command: abort task.." on starwind log and "event id 129" on windows event log this behavior shows up immediately when i try to clone a VM on the same LUN... and clone operation last forver
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Mon Nov 16, 2015 4:32 pm

I have a strange feeling that you are missing something during configuration/initial setup. Could you please give me a bit more description of your configuration and what you are trying to achieve?
fstd
Posts: 6
Joined: Mon Nov 09, 2015 10:08 pm

Mon Nov 16, 2015 9:56 pm

i'm just trying the product in lab with a simple setup,

single esx host 5.5u3 (tried u2 too)

starwind vsa (latest build) windows 2008r2 (tried 2012r2 and build with VSA wizard too) hosted on local datastore (serveraid M5015 sata disks in raid 5 but tried LSI2008 in IT mode with single SSD drive too )

no mirror only one vsa

i've exported a flat image file (tried with/without L1 cache, with/without L2 cache, using lsfs instead of flat ) as iscsi lun to esx host through internal network (so no physical network issues)

i've then formatted the LUN as VMFS 5 and put a vm inside the ISCSI datastore, the VM is doing some i/o (mainly reads, 60MBps 600 iops)

i've left the vm powered on for a few days without any issue

i've then tried to clone the powered on vm on the same dastore and this is the point where the issue arise, as soon as esxi complete the snapshot and begin to clone the vm the i/o suddenly stops and on VSA event viewer many SCSI reset errors appears (tried with LSI and paravirtual controllers with similar results), the strange thing is that the FS who host the starwind image file is perfecly accesible but the ISCSI LUN seems frozen and esxi loose the connection to the datastore. the only difference with paravirtual controller is that io resume every minute for a few seconds (and clone complete) whereas with LSI controller the io stops forever (or for a longer time at least).

i've tried then to:

power off vm and clone it to vm2 on same datastore ----> all ok
power on vm and clone vm2 to vm3 on same datastore -----> all ok
power on vm2 and clone vm3 to vm4 on stame datastore -----> all ok
clone vm2 (powered on) as vm5 -----> io stops

the only thing who seems to solve the issue is to pass through the controller to VSA vm,with this configuration (on the same hardware obviusly ) things are working as expected ....

not sure if this is a starwind-or-(windows on vmware) problem....
hste
Posts: 17
Joined: Wed Mar 05, 2014 9:42 pm

Tue Nov 17, 2015 8:04 am

fstd

I had same problem when I tried it first time. My problem was that I had a routing problem. I hadn't isolated the iscsi network with jumboframes and some traffic was sometimes routed on my normal vlan that didn't have jumboframes.
The vmfs then got corrupted when I did some heavy stuff.


hste
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Tue Nov 17, 2015 9:58 am

A few things with networking that will *always* lead to problems:
1) Using same subnets for different networks like ISCSI/Synchronization/Management
2) Incorrect or incomplete jumbo frames settings, like enabling these on NIC and forgetting about further network infrastructure (switches and so on). You can always check if they are ok with ping -f -l 8000 command.
3) Using same physical network adapter (and network obviously) for both ISCSI and Synchronization.
User avatar
Tarass (Staff)
Staff
Posts: 113
Joined: Mon Oct 06, 2014 10:40 am

Mon Nov 23, 2015 10:56 am

Hi, guys! Still having issues? Anything else I can help you with?
Senior Technical Support Engineer
StarWind Software Inc.
fstd
Posts: 6
Joined: Mon Nov 09, 2015 10:08 pm

Sun Nov 29, 2015 11:32 pm

hi to all,

i've double checked my config but i haven't found anything wrong, in fact my config is very simple as there is no:
-syncronization (only one node)
-multipath ( only one nic)
-external network (the host esx communicates with VSA through an internal netwok)

to check further i've disabled starwind on VSA and enabled microsoft ISCSI target service (so , same config except iscsi target) and the vm clone problem went away.

on starwind log i see the following lines when the connection drops:

11/29 23:43:43.873 868 C[8], IN_LOGIN: Event - LOGIN_ACCEPT.
11/29 23:43:43.873 868 C[8], LIN: T5.
11/29 23:43:44.123 868 PR: UA 0x2901 returned to opcode 0x28 for session 0x8 from iqn.1998-01.com.vmware:esxlabo-7287f6cc,00023D000001.
11/29 23:44:45.186 f3c T[8,14652]: Management command: abort task (CmdSN 3384728, ITT 0xb7a53300) not found, but is in range.
11/29 23:44:57.202 f3c T[8,14654]: Management command: abort task (CmdSN 3384728, ITT 0xb7a53300) not found, but is in range.
11/29 23:45:02.842 f3c C[8], LIN: recvData returned 10058
11/29 23:45:02.842 f3c C[8], LIN: *** 'recv' thread: recv failed 10058.
11/29 23:45:05.358 e5c Srv: Accepted iSCSI connection from 192.168.121.1:54572 to 192.168.121.3:3260. (Id = 0x9)

esx then reconnects and after a few seconds the clone continues and after sme data transferred connection drops again and so on.
User avatar
darklight
Posts: 185
Joined: Tue Jun 02, 2015 2:04 pm

Tue Dec 01, 2015 5:44 pm

fstd, what kind of starwind device are you using? L1 cache? L2 cache?

0x28 SCSI opcode seems to me like some issues with reading...
fstd
Posts: 6
Joined: Mon Nov 09, 2015 10:08 pm

Tue Dec 01, 2015 10:03 pm

i'm using flat image backend,

tried with (l1 cache (writeback|disabled) + no l2) and with (l1 writeback+l2) but results are the same

please note that problem happen only if cloned powered on vm is hosted on starwind lun and is cloned on the same or another lun on the same starwind vsa. Cloning the vm away from starwind vsa doesn't trigger the error (even cloning multiple machines at once) this should exclude read errors

tried to enable/disable vaai too with no results

this issue seems to be strictly related to clone operation, stressing out starwind with iometer from within a vm with vmdk hosted on vsa doesn't trigger any error
Post Reply