Wed Oct 29, 2014 2:51 am
Just wanted to provide an update...
Since I didn't hear any responses I decided to reinstall the most recent ESXi build on the host I've been experimenting with. I only performed the most basic of configurations on the host, enough to get one management NIC working, followed by the local iscsi traffic vSwitch as per Step 3 as well as the iscsi vmkernel on that vSwitch. I was able to mount a LUN successfully, which is further than I got the first time I tried. I used the E1000E adapter.
I ran CrystalDiskMark several times on the SAN VM to make sure I wasn't having any performance issues with the underlying raid array. Performance is fine.
After I mounted the LUN, I created a new Win 2012 R2 VM. Performance was horrible. I removed the NICS from the SAN VM, readded them as VMXNET 3 and rebooted. Performance was much, much better. Once the test VM completed the initial installation, it rebooted itself and began to configure the operating system and install the necessary drivers. At this point the VM locked up, then the LUN got corrupted and unmounted itself. The device stayed visible within ESXi but the datastore itself dropped off the list. When I tried to add storage, Vsphere wanted me to reformat the datastore. The LUN itself within Starwind was configured as a 40GB image disk with 512K of write back cache. The VM was set up per the document, 4CPU and 4GB of RAM.
These are some of the errors I see in the Starwind log:
10/28 19:29:03.747 9f8 Srv: Worker: GetQueuedCompletionStatus() failed (error 1117)!
10/28 19:29:03.747 9f8 IMG: *** ImageFile_IoCompleted: Error (1117) returned to IO completion!
10/28 19:29:03.747 9f8 IMG: *** ImageFile_ReadWriteSectorsCompleted: Error occured (ScsiStatus = 2, DataTransferLength = 0)!
10/28 19:29:09.969 b3c T[2,7]: Management command: abort task (CmdSN 514896, ITT 0x89db0700) not found.
10/28 19:29:09.969 9f8 IMG: *** ImageFile_ScsiExec: Deferred error reported for session id 5.
10/28 19:29:12.971 9f8 IMG: *** ImageFile_IoCompleted: Error (1117) returned to IO completion!
10/28 19:29:13.080 9f8 IMG: *** ImageFile_ReadWriteWithCacheCompleted: Error (0xC0000001) returned to cache request completion!
10/28 19:29:13.252 9f8 IMG: *** ImageFile_ReadWriteSectorsCompleted: Error occured (ScsiStatus = 2, DataTransferLength = 0)!
10/28 19:29:15.910 a24 PR: [pr] LUN 0, scsiop 0x0: session 0x1 is not registered!
10/28 19:29:15.910 a24 PR: [pr] LUN 0, scsiop 0x89: session 0x1 is not registered!
10/28 19:29:15.910 a24 PR: [pr] LUN 0, scsiop 0x28: session 0x1 is not registered!
10/28 19:29:15.910 b3c C[2], LIN: recvData returned 10058
10/28 19:29:15.910 b3c C[2], LIN: *** 'recv' thread: recv failed 10058.
10/28 19:29:15.910 b58 Tgt: close 'iqn.2012-06.com.cedarwoodtechsolutions:san2-target': 0 session(s) opened, 65536 more allowed.
10/28 19:29:15.910 9f8 IMG: *** ImageFile_IoCompleted: Error (1117) returned to IO completion!
10/28 19:29:15.910 9f8 IMG: *** ImageFile_ReadWriteSectorsCompleted: Error occured (ScsiStatus = 2, DataTransferLength = 0)!
10/28 19:29:15.957 9f8 IMG: *** ImageFile_IoCompleted: Error (1117) returned to IO completion!
10/28 19:29:15.957 9f8 IMG: *** ImageFile_ReadWriteSectorsCompleted: Error occured (ScsiStatus = 2, DataTransferLength = 0)!
10/28 19:29:15.957 9f8 IMG: *** ImageFile_ReadWriteWithCacheCompleted: Error (0xC0000001) returned to cache request completion!
10/28 19:29:18.943 9e4 Srv: Accepted iSCSI connection from 192.168.10.11:61859 to 192.168.10.1:3260. (Id = 0x3)
10/28 19:29:35.061 9f8 IMG: *** ImageFile_ReadWriteSectorsCompleted: Error occured (ScsiStatus = 2, DataTransferLength = 0)!
10/28 19:29:35.061 9f8 IMG: *** ImageFile_ReadWriteWithCacheCompleted: Error (0xC0000001) returned to cache request completion!
10/28 19:29:35.061 9a0 SCSI: VAAI C&W: READ (0x28) returned CHECK CONDITION (3/0/0)!
10/28 19:29:35.061 9f8 IMG: *** ImageFile_IoCompleted: Error (1117) returned to IO completion!
10/28 19:29:35.061 9f8 IMG: *** ImageFile_ReadWriteSectorsCompleted: Error occured (ScsiStatus = 2, DataTransferLength = 0)!
10/28 19:29:35.061 9f8 IMG: *** ImageFile_ReadWriteWithCacheCompleted: Error (0xC0000001) returned to cache request completion!
10/28 19:29:35.061 9a0 SCSI: VAAI C&W: READ (0x28) returned CHECK CONDITION (3/0/0)!
10/28 19:29:35.452 9f8 IMG: *** ImageFile_IoCompleted: Error (1117) returned to IO completion!
Throughout all of this, the SAN VM stayed responsive and online (I know because I was looking at the console and clicking to see the logs and Starwind performance information). To me, this rules out my raid array as the culprit. To me, the issue seems to be within Starwind itself. I'm happy to supply you with the entire log if it would help and I'm happy to try any suggestions. Otherwise, I'm going to scrap this entire effort as I cannot trust my data to not become corrupted.
Thanks,
Chris