Starwind Vsan, installation troubles

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Anatoly (staff), Max (staff)

Post Reply
michael.blanchard
Posts: 12
Joined: Fri Oct 06, 2017 2:59 pm

Fri Oct 06, 2017 3:19 pm

(2) servers. Connected via 1 20gb infiniband link (10.100.100.4), 1 10gb ethernet link (172.16.9.4), and 1 1gb ethernet links (172.16.9.24).
ideally, replication traffic will flow over the infiniband and the ethernet link, with heartbeat going over the gigabit.
1x 256Gb NVMe drive for flash cache
1x 8Tb Raid 10 drive array (4 drives) from an LSI 9271 with 1GB of cache memory

Following the 2 node converged setup:
1. managed to get 2 of 4 volumes replicated, larger volumes (3-5Tb) never replicated
2. Re-configuring multiple times, the hearbeat networks (gigabit and 10gb) never work, always have a red X through it. I verified nothing else was using port 3260
3. Performance via local loopback iSCSI connection was decent read speed, but horrible write speed
4. Syncing an empty 2tb volume seems to take forever, the software stated 3hrs initially before I went to bed, then I wake up this morning to have it not synced.

10/6 10:51:37.895 1a28 C[6b1], XPT_UP: T3.
10/6 10:51:40.981 20a4 iSERs: IND2Connector::Connect failed with c00000b5
10/6 10:51:40.981 20a4 iSERs: iSerDmSocket::Connect failed with c00000b5!
10/6 10:51:40.983 20a4 Sw: *** ConnectPortal: Unable to connect to the 172.16.9.6:3260 portal from the 0.0.0.0 interface.
10/6 10:51:40.983 20a4 HA: CNIXInitiator::MountTarget: unable to mount the target (1627)!
10/6 10:51:45.985 20a4 iSERs: Created QueuePair (recv 264, init 1056, cq 1320, group 0, affinity 0xf).
10/6 10:51:47.391 ea0 iSERs: IND2Connector::Connect failed with c00000b5
10/6 10:51:47.391 ea0 iSERs: iSerDmSocket::Connect failed with c00000b5!
10/6 10:51:47.393 ea0 Sw: *** ConnectPortal: Unable to connect to the 172.16.9.26:3260 portal from the 0.0.0.0 interface.
10/6 10:51:47.393 ea0 HA: CNIXInitiator::MountTarget: unable to mount the target (1627)!
10/6 10:51:52.394 ea0 iSERs: Created QueuePair (recv 264, init 1056, cq 1320, group 0, affinity 0xf).
10/6 10:53:16.209 1a28 iSERs: Created QueuePair (recv 264, init 1056, cq 1320, group 0, affinity 0xf).
10/6 10:53:16.210 1a28 iSERs: iSER: Accept: using Ird = 0, Ord = 16.
10/6 10:53:16.210 1a28 iSERs: IND2Connector::Accept failed with c000009a
10/6 10:53:16.210 1a28 iSERs: Socket::Accept failed with c000009a
10/6 10:53:16.210 1a28 Srv: Accepted iSER connection from 172.16.9.6:45057 to 172.16.9.4:3260. (Id = 0x6b2)
10/6 10:53:16.210 1a28 S[6b2]: Session (000002190AB9BB00)
10/6 10:53:16.210 1a28 C[6b2], FREE: Event - CONNECTED.
10/6 10:53:16.210 1a28 C[6b2], XPT_UP: T3.
10/6 10:53:21.715 20a4 iSERs: IND2Connector::Connect failed with c00000b5
10/6 10:53:21.715 20a4 iSERs: iSerDmSocket::Connect failed with c00000b5!
10/6 10:53:21.718 20a4 Sw: *** ConnectPortal: Unable to connect to the 172.16.9.6:3260 portal from the 0.0.0.0 interface.
10/6 10:53:21.718 20a4 HA: CNIXInitiator::MountTarget: unable to mount the target (1627)!
10/6 10:53:26.719 20a4 iSERs: Created QueuePair (recv 264, init 1056, cq 1320, group 0, affinity 0xf).
10/6 10:53:28.277 ea0 iSERs: IND2Connector::Connect failed with c00000b5


*** Update 1 ***
Created a 1TB drive, full provisioning, mounted locally and the speeds are comparable, so it looks like the thin provisioning with dedup is not a good thing to use for your write speeds.
Ivan (staff)
Staff
Posts: 172
Joined: Thu Mar 09, 2017 6:30 pm

Fri Oct 06, 2017 6:33 pm

Hello michael.blanchard,
Thank you for your interest in StarWind solution.
Could you please provide the full StarWind device configuration (virtual block size, size, caching, etc.)
1x 8Tb Raid 10 drive array (4 drives) from an LSI 9271 with 1GB of cache memory
Is that CacheCade hardware caching? If yes, which cache mode you are using for it?
Re-configuring multiple times, the hearbeat networks (gigabit and 10gb) never work, always have a red X through it. I verified nothing else was using port 3260
How is this network connected? Is that direct connection or via Switch? Can you even ping the partner IP? Could you, also, more deeply describe the network logic which you are using in this configuration?
Performance via local loopback iSCSI connection was decent read speed, but horrible write speed
Can you share with us the benchmark results which are compared to underlying storage and StarWind StandAlone or HA? We highly recommend using diskspd as the benchmark tool.

I am looking forward to hearing from you.
michael.blanchard
Posts: 12
Joined: Fri Oct 06, 2017 2:59 pm

Thu Oct 12, 2017 3:18 pm

I've been working with this over the last 3 days and i've about lost my marbles.

The two servers are close via hardware, but not exactly the same:
Node #1:
LSI 9271, With Cachevault, no cachecade

Node #2:
LSI 9260 without a BBU, but has cachecade enabled with 2x 250Gb SSDs mirrored.

Both servers a 4Gb of L1 and 215GB of L2, the L2 is on an Nvme drive
The network seems to work no since I changed the subnet of the other nic to a new VLan, performance is still lacking.

So far:
1. I updated to latest version
2. Tried enabling / Disabling ODX, no change in performance
3. Made sure to enable listening on all iSCSI interfaces in config file
4. Tried turning on / off L1 cache via starwind and Turn off RAID controller memory.

I see alot of these errors in the logs on both servers:

10/12 9:46:03.759 218 HA: Warning: SscRequestTask(00000192D2D48430) OpCode(0x2A) ex(0x0) DiffTimeCompleteINIT = 6547 ms, DiffTimeCompleteEXEC = 6547 ms!
10/12 9:46:03.776 218 HA: Warning: SscRequestTask(00000192D2D47A10) OpCode(0x2A) ex(0x0) DiffTimeCompleteINIT = 6563 ms, DiffTimeCompleteEXEC = 6563 ms!
10/12 9:46:03.802 218 HA: Warning: SscRequestTask(00000192D2D46E40) OpCode(0x2A) ex(0x0) DiffTimeCompleteINIT = 6579 ms, DiffTimeCompleteEXEC = 6579 ms!
10/12 9:46:03.910 218 HA: Warning: SscRequestTask(00000191C99F0C00) OpCode(0x2A) ex(0x0) DiffTimeCompleteINIT = 6688 ms, DiffTimeCompleteEXEC = 6688 ms!
10/12 9:46:03.978 218 HA: Warning: SscRequestTask(00000192D2D46C90) OpCode(0x2A) ex(0x0) DiffTimeCompleteINIT = 6766 ms, DiffTimeCompleteEXEC = 6766 ms!
10/12 9:46:04.078 218 HA: Warning: SscRequestTask(00000192D2D47D70) OpCode(0x2A) ex(0x0) DiffTimeCompleteINIT = 6860 ms, DiffTimeCompleteEXEC = 6860 ms!
10/12 9:46:04.104 218 HA: Warning: SscRequestTask(00000192D51C46C0) OpCode(0x2A) ex(0x0) DiffTimeCompleteINIT = 6891 ms, DiffTimeCompleteEXEC = 6891 ms!
10/12 9:46:12.325 24c4 HA: CHADevice::SscRequestTaskExecute: SCSI opcode 0x4D is not supported!
10/12 9:46:12.325 24c4 HA: CHADevice::CompleteSscCommandRequest: (0x4D) CHECK_CONDITION , sense: 0x5 0x20/0x0 returned.
10/12 9:46:13.171 2518 HA: CHADevice::SscRequestTaskExecute: SCSI opcode 0x4D is not supported!
10/12 9:46:13.171 2518 HA: CHADevice::CompleteSscCommandRequest: (0x4D) CHECK_CONDITION , sense: 0x5 0x20/0x0 returned.
10/12 9:51:26.779 24c4 HA: CHADevice::SscRequestTaskExecute: SCSI opcode 0x4D is not supported!
10/12 9:51:26.779 24c4 HA: CHADevice::CompleteSscCommandRequest: (0x4D) CHECK_CONDITION , sense: 0x5 0x20/0x0 returned.
10/12 9:51:26.806 2518 HA: CHADevice::SscRequestTaskExecute: SCSI opcode 0x4D is not supported!
Attachments
Benchmarks, RAID drive on left, starwind drive on right
Benchmarks, RAID drive on left, starwind drive on right
Capture.PNG (121.68 KiB) Viewed 11279 times
michael.blanchard
Posts: 12
Joined: Fri Oct 06, 2017 2:59 pm

Thu Oct 12, 2017 8:47 pm

Capture2.PNG
Capture2.PNG (150.34 KiB) Viewed 11265 times
There is definitely something weird going on. I created a 5gb Virtual drive on a locally attached SSD, and then mapped it via iscsi loopback, and didn't add in the secondary connection to the other server, to remove the network and the RAID controllers as a source of issues, and I get even worse results:
michael.blanchard
Posts: 12
Joined: Fri Oct 06, 2017 2:59 pm

Thu Oct 12, 2017 10:40 pm

I ran the Storage test provided by starwind and it backs up what I saw with CrystalDiskmark
Attachments
First disk is iSCSI loopback, second disk is actual RAID drive
First disk is iSCSI loopback, second disk is actual RAID drive
Capture3.PNG (35.47 KiB) Viewed 11262 times
Ivan (staff)
Staff
Posts: 172
Joined: Thu Mar 09, 2017 6:30 pm

Fri Oct 13, 2017 11:45 am

Hello Michael,
Could you please collect the logs and share it with us?
For quicker and easier log collection from StarWind nodes please do not hesitate using the script from our knowledge base article below:
https://knowledgebase.starwindsoftware. ... collector/
You can upload the collected logs to any cloud (dropbox, google drive, OneDrive, etc.) and share the link for download.
Thank you.
Ivan (staff)
Staff
Posts: 172
Joined: Thu Mar 09, 2017 6:30 pm

Mon Oct 16, 2017 3:02 pm

Michael,

Please submit a support case (https://www.starwindsoftware.com/support-form) and indicate the forum topic you come from (and/or your forum nickname). We will discuss all your topics.
michael.blanchard
Posts: 12
Joined: Fri Oct 06, 2017 2:59 pm

Thu Oct 19, 2017 1:07 pm

So I found some issues with my infiniband, and after some more testing, I've confirmed that the bottleneck is the loopback adapter, I can map a drive via iSCSI across the infiniband and I get 4x the write speed to an SSD on the other server. I also noticed that evertime I reboot, the "iscsidiscoverylistinterfaces" line is always changed back to 0.
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Tue Oct 31, 2017 10:15 am

Thank you, Michael.
I believe the question about "iscsidiscoverylistinterfaces" will be resolved if you will stop StarWind service before configuration file edit.
Post Reply