StarWind iSCSI SAN V5.7 Build 20110524

Public beta (bugs, reports, suggestions, features and requests)

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

rchisholm
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Thu Jun 02, 2011 12:56 pm

I was using SQLIO because the majority of my RAID tuning has been for SQL, and those are the numbers I'm used to looking at. Makes it easier for me to compare to other storage systems I've worked with. I'll run some more tests with the other tools for you, and to start collecting other stats for myself as well. I have a ton of Citrix Xen work to get done, but as soon as I finish it (probably be over the weekend) I'll run more tests from both ESXi and Xen.

I'd like to know so more about the cache manager that you are planning for 5.8. I would like to see something like a cache pool with the ability to set how much total system RAM to use for caching, and be able to set minimum and maximum amounts of cache per target, as well as a low, medium, high priority. Kind of like the resource allocation in VMware. I would also like to be able to set the cache write/read ratios per target to 0/100, 25/75, 50/50, 75/25, 100/0. Our next StarWind boxes that I'll most likely be building in 3-6 months will be 2 HP DL585 G7 Quad 12 Core 2.5GHz with 512GB RAM each, 4 LSI 9285-8E controllers each, a 6 port 10GbE Interface Masters PCI-E x16 each for iSCSI. I am trying to decide on the sync channels. 40GbE would be nice if you have some recommendations on a solution that wouldn't require me adding a switch for it. The storage will be DataON DNS-1660 4U 60 drive dual I/O SAS JBODs. We'll be starting with 2 of the JBODs connected to each server, and have the ability to add 2 more to each server with each JBOD having a dedicated 9285-8E. The drives for mass storage will be Seagate 3TB SAS 7.2K drives, and I'm considering some Cheetah 15K.7 600GB drives and RAID 10's for some of our smaller, virtualized databases. Technically, we could do 4 JBODs cascaded from each 9285-8E, but I think additional servers with more RAM for cache would make more sense.

The other ability I need is to asynchronously replicate data between two StarWind HA SANs. The plan is to move the current SAN to our location in the next city over with a point to point fiber connection between them and the new SAN that would be implemented in our Data Center. I currently have a pair of 10GbE fiber connections trunked to 20GbE between our Corporate office and the Data Center, and will have reconfigured two of our older 24TB storage servers to an HA SAN in about 2 weeks that can be used to test and develop this.

Thanks for all your hard work on this software. I'm more impressed with each new version. I bet you all are making some of the bigger companies more than a little nervous.
anton (staff) wrote:Wow! These are real good numbers! Congratulations. And you guys are not greedy when it comes to buying new hardware :) I'm jealous.

Yes, we've definitely "did some changes to non-cached I/O" as well :)

SQLIO is nice and it does indeed pop up numbers SQL Server should hit (sick!) but... Do we have any chance to see Intel I/O Meter and Intel NAS Performance Toolkit test run results as well?

Thank you very much for cooperation!
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 03, 2011 10:30 am

It's absolutely OK and I'm fine with SQLIO :) Take your time. Very much appreciated in any case. You have a very great chances to change software you use reporting feedback and issues.

Yes, cache would not be static. You'd be able to change settings on-the-fly and provide high and low watermark for *preferred* LUN cache size and also absolutely minimum cache size used. Not sure what your read/write ratios mean as they don't map well to cache management algorithm we use.

Configuration sounds great. I'd start with cheaper drives going up on requirement.

40 GbE is in it's early days still. The only experience we have is Mellanox and we've managed to squeeze 26 Gbps from 40 GbE link before PCI-e bus saturated. So right now we're about to start second iteration with new hardware and pretty much re-written StarWind kernel. Use a pair of 10 GbE links (5.8 would be able to use multi-channel sync so no teaming would be required) and let 40 GbE and 100 GbE grow :)

Async replication for StarWind is coming. However looks like we call with this name very different features. Soon StarWind would be able to run in dual or triple HA nodes and have some extra nodes (1+) located remotely to keep extra copy of your data. But it would be one day ticket only. Slow WAN would not allow to have real sync thing on-line. Last chance recovery. If you do have at least 1 Gb link between your main sites I don't see any issue in geographic HA. Should work even now.

You're right. Every extra $1M in cash we make is $3M-$5M somebody else did not do the same year :)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
rchisholm
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Fri Jun 03, 2011 12:48 pm

That all sounds great.

By read/write ratios I was referring to how some RAID controllers allow you to allocate certain percentages of the cache to read and write specifically. With dynamic control of the cache, I don't think it would be necessary. Dynamic with a good algorithm behind it would probably be much better for a broader range of read/write activities anyway.

When you saturated the PCI-e bus at 26Gbps, were you using PCI-e 2.0 x8 or x16? I'm trying to figure out if I should go to dual 4 port 10GbE PCI-e 2.0 x16 cards in each box after reading your last post.

I don't actually need true geographic HA. I just need to be able to make sure that in the event the Data Center was destroyed, that we wouldn't lose more than the work done that day. We'll have a 1 Gb point to point fiber between the Data Center and the Off Site DR/BC location, but people above me would much rather only run it 100 Mb at half the price. Personally, I think it will be 1 Gb almost immediately after we turn it up. We own the fiber between the Data Center and Corporate Office, and transceivers for 10GbE for a 700 ft fiber link aren't too horribly expensive.
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 03, 2011 7:14 pm

I see. But I don't think it (read/write allocation) makes sense.

It was PCI-e x16 but you know what's supposed to be is not what we could always have in the real world. That's it... Check TCP performance before submitting final purchase order. My $0.02 here :)

So you actually need HA located in one place and async replication to DR site with snapshots facility. Right?
rchisholm wrote:That all sounds great.

By read/write ratios I was referring to how some RAID controllers allow you to allocate certain percentages of the cache to read and write specifically. With dynamic control of the cache, I don't think it would be necessary. Dynamic with a good algorithm behind it would probably be much better for a broader range of read/write activities anyway.

When you saturated the PCI-e bus at 26Gbps, were you using PCI-e 2.0 x8 or x16? I'm trying to figure out if I should go to dual 4 port 10GbE PCI-e 2.0 x16 cards in each box after reading your last post.

I don't actually need true geographic HA. I just need to be able to make sure that in the event the Data Center was destroyed, that we wouldn't lose more than the work done that day. We'll have a 1 Gb point to point fiber between the Data Center and the Off Site DR/BC location, but people above me would much rather only run it 100 Mb at half the price. Personally, I think it will be 1 Gb almost immediately after we turn it up. We own the fiber between the Data Center and Corporate Office, and transceivers for 10GbE for a 700 ft fiber link aren't too horribly expensive.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
rchisholm
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Fri Jun 03, 2011 8:29 pm

I agree, with the cache being dynamic, specifying how much to use for read and how much for write doesn't make sense.

I think I'll go with 2 of the 4 port 10GbE x16 NICs in each server. Each 4 port will probably have the same real world aggregate throughput as a 6 port. If I need to, I can always throw some more NICs in these servers. They can be setup to have up to 4 x16 and 7 x8 cards in each of them.

Correct on the setup. We need very high performance HA in our Data Center, and offsite snapshots so we can bring stuff back up in a relatively short period of time. A lot of our infrastructure will take some manual work to switch over. We'll have the VMware and Xen servers already setup there, a mirror of the main database, and other servers sitting there ready to go. We are setup with 3 dedicated circuits and 3 6KVA UPSs in each rack and a diesel generator that backs them up and also powers the air conditioning units, so our survivability is quite good. We run 72 hours in advance, so as long as we can be back up and running in 48 hours we're OK.
anton (staff) wrote:I see. But I don't think it (read/write allocation) makes sense.

It was PCI-e x16 but you know what's supposed to be is not what we could always have in the real world. That's it... Check TCP performance before submitting final purchase order. My $0.02 here :)

So you actually need HA located in one place and async replication to DR site with snapshots facility. Right?
rchisholm wrote:That all sounds great.

By read/write ratios I was referring to how some RAID controllers allow you to allocate certain percentages of the cache to read and write specifically. With dynamic control of the cache, I don't think it would be necessary. Dynamic with a good algorithm behind it would probably be much better for a broader range of read/write activities anyway.

When you saturated the PCI-e bus at 26Gbps, were you using PCI-e 2.0 x8 or x16? I'm trying to figure out if I should go to dual 4 port 10GbE PCI-e 2.0 x16 cards in each box after reading your last post.

I don't actually need true geographic HA. I just need to be able to make sure that in the event the Data Center was destroyed, that we wouldn't lose more than the work done that day. We'll have a 1 Gb point to point fiber between the Data Center and the Off Site DR/BC location, but people above me would much rather only run it 100 Mb at half the price. Personally, I think it will be 1 Gb almost immediately after we turn it up. We own the fiber between the Data Center and Corporate Office, and transceivers for 10GbE for a 700 ft fiber link aren't too horribly expensive.
User avatar
DavidMcKnight
Posts: 39
Joined: Mon Sep 06, 2010 2:59 pm

Mon Jun 06, 2011 2:42 pm

I'm getting a error trying to mount a newly created HA within VMWare.

I created a 1TB HA img with the following settings:

<device name="HAImage1"
target="iqn.2008-08.com.starwindsoftware:www.xxx.yyy.zzz-volume-ha-01"
file="My Computer\E\Volume-HA-01.img"
serialId="61ce634f-e7ed-48ef-bc18-46a271597df6"
asyncmode="no"
clustered="yes"
readonly="no"
highavailability="yes"
buffering="no"
header="65536"
reservation="no"
alias="Volume-HA-01"
CacheMode="wb"
CacheSizeMB="512"
CacheBlockExpiryPeriodMS="5000"
/>
The creation process when just fine.

When I go to my vCenter and rescan for Datastores it shows up just fine. Now, I do have dual 10Gig NICs in both of my Starwind servers, so vCenter sees 4 paths to this HA datastore. When I go to "Add Storage" and choose the HA volume and click next I'm getting the following error:

Call "HostDatastoreSystem.QueryVmfsDatastoreCreateOptions" for object "datastoreSystem-8187" on vCenter Server "vCenter.yyy.zzz" failed.

I can add Non-HA just fine.
User avatar
Bohdan (staff)
Staff
Posts: 435
Joined: Wed May 23, 2007 12:58 pm

Mon Jun 06, 2011 2:50 pm

Hi David,
We need StarWind logs to find what's wrong.
Please make sure that # of VMkernel = # of NICs and they are all in the same subnet and vmnic binding was performed.
Here are also some links to similar problems and their solutions:
http://communities.vmware.com/message/1587674
User avatar
DavidMcKnight
Posts: 39
Joined: Mon Sep 06, 2010 2:59 pm

Mon Jun 06, 2011 7:49 pm

Thanks, the common issue with those post was a bad RAID build. I reinitialized my RAID volume and tried again. Everything is working now.
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 10, 2011 7:42 pm

Excellent! Nice to hear you're fine now and thank you very much for clarification!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
DavidMcKnight
Posts: 39
Joined: Mon Sep 06, 2010 2:59 pm

Fri Jun 17, 2011 7:12 pm

Should I be able to test Deduplication? I don't know if this is something I should be able to test in the 5.7 or not. Regardless, I can't seem to even create a DD target. Once I click past the screen for choosing which kind of cache I want (None/WT/WB). I get a "Device test failed for 'DDDisk1'.
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 17, 2011 8:16 pm

Make sure you're running the most recent beta (V5.7 is all the same but build dates are different). After all complete StarWind log is required. Verify your version, re-produce the issue and let us take a look at your logs. That's all...
DavidMcKnight wrote:Should I be able to test Deduplication? I don't know if this is something I should be able to test in the 5.7 or not. Regardless, I can't seem to even create a DD target. Once I click past the screen for choosing which kind of cache I want (None/WT/WB). I get a "Device test failed for 'DDDisk1'.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
DavidMcKnight
Posts: 39
Joined: Mon Sep 06, 2010 2:59 pm

Fri Jul 01, 2011 5:34 pm

anton (staff) wrote:Make sure you're running the most recent beta
I'm still trying to get Deduplication to work. I've installed the June 15th build:

{excerpt from starwind log}

6/26 0:10:09.815 16b8 Srv: StarWind iSCSI SAN Software v5.7.0 (Build 20110615, [SwSAN], Win64)
6/26 0:10:09.815 16b8 Srv: Built Jun 15 2011 13:08:41


I'm getting a different error, but I'm still not able to even create a DD Disk.

When I use these settings and click next I get the following error box.

Image
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Thu Jul 07, 2011 8:43 am

1) Localization bug will be fixed in the next version, that is comming soon.
2)Please open the starwind.cfg file and find the next strings:

<plugin module="DDDisk.dll">
<symlink value="DDDisk"/>
<type value="Deduplicated disk"/>
<imagedir path="*" flags="cmdfv" alias="My Computer" extensions="spbitmap"/>
<imagedir path="*" flags="cmdfv" alias="Metadata" extensions="spmetadata"/>
<volumes value="no"/>
</plugin>


Check is the string <imagedir path="*" flags="cmdfv" alias="Metadata" extensions="spmetadata"/> is present there. If not add it as it is on the example above and restart the StarWind service please.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
rchisholm
Posts: 63
Joined: Sat Nov 27, 2010 7:38 pm

Thu Jul 07, 2011 1:32 pm

I did some more testing. 2008 R2 MPIO performance is outstanding with this version. These tests were run from an HP BL465c G7 in a c7000 chasis with 2 Virtual Connect Flex 10's. It has two 8Gb iSCSI HBA's set up. This is going to be one of our 5 new database servers. You have to love having no hard drives at all in a blade that boots from SAN and posts these kinds of numbers. :mrgreen:

C:\Program Files (x86)\SQLIO>sqlio -b64 -o8 -kR -BN -LS -frandom -Fparam.txt
sqlio v1.5.SG
using system counter for latency timings, 2148476 counts per second
parameter file used: param.txt
file d:\testfile.dat with 8 threads (0-7) using mask 0x0 (0)
8 threads reading for 30 secs from file d:\testfile.dat
using 64KB random IOs
enabling multiple I/Os per thread with 8 outstanding
buffering set to not use file nor disk caches (as is SQL Server)
using specified size: 1000 MB for file: d:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 19679.62
MBs/sec: 1229.97
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 2
Max_Latency(ms): 8
histogram:
ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 8 23 18 11 25 11 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

C:\Program Files (x86)\SQLIO>sqlio -b64 -o8 -kW -BN -LS -frandom -Fparam.txt
sqlio v1.5.SG
using system counter for latency timings, 2148476 counts per second
parameter file used: param.txt
file d:\testfile.dat with 8 threads (0-7) using mask 0x0 (0)
8 threads writing for 30 secs to file d:\testfile.dat
using 64KB random IOs
enabling multiple I/Os per thread with 8 outstanding
buffering set to not use file nor disk caches (as is SQL Server)
using specified size: 1000 MB for file: d:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 7003.13
MBs/sec: 437.69
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 8
Max_Latency(ms): 208
histogram:
ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 0 28 17 2 1 0 0 0 0 0 0 0 1 2 7 16 15 6 2 1 0 0 0 0 0
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jul 08, 2011 8:14 am

OMG

You're still investing into hardware money comparable to Liberia external debt :))

What do you think about native hybrid storage support? I mean automatic data tiering between SSD and HDD? Resulting system is expected to push SSD comparable benchmark numbers for just a fraction of SSD system cost.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply