Starwind wishlist

Aitor_Ibarra · Sat Sep 18, 2010 4:09 pm

Starwind 5.5 is great, almost everything I wished 5.0 was! Here's my wishlist beyond 5.5 - hopefully at least some of these are things you are already working on!

I'm just throwing this out to see what everyone thinks; I'm sure that other people will have other things that are important to them.

I think most of these are relatively simple (still doesn't mean that they will get done quickly - there may be higher priorities) but probably the stuff in section 5) is really quite major and may need to wait to version 6 or beyond.

cheers,

Aitor

1) High Availability features

- if partner target lost completely, ability to rebuild from remaining partner without taking the target offline
- turn a non-ha img target into an HA target - without taking the target offline
- fast sync for targets using WB cache
- extend HA devices without having to recreate them or take them offline
- use MPIO for sync channel, for more redundancy and performance

2) UI features

- disconnect an initiator without deleting a target
- rename a target without deleting it (with warning that initiators will be disconnected)
- change cache policy and amount of RAM allocated without deleting target or disconnecting inititators
- default window pane sizeing - top right pane height shrinks to fit, so you see more of bottom right pane by default. Same with device tab, so devices sub pane shrinks to fit, you see more of device properties
- live monitoring of:
- - read, write, total bandwidth, per device and per ha partner, and per initiator
  - read IOPs, write IOPs, total IOPS, per device and per ha partner
  - cache hit ratio (% of reads that are serviced by WT/WB cache), with a reset button
  - graphical representation of cache. A coloured coded horizontal bar, showing proportions of cache that are
  - - written to cache, but not yet to disk (WB only)
    - written to disk
    - written to disk and have been read again
    - written to cache, but not yet to disk, and have been read again (WB only)
    - read from disk, but not yet read again
    - read from disk, and red again
    - unused / expired
  - also expand this to show the amounts in MB & %
- ha resync - show % completed next to icon in left hand tree pane
- ha resync - next to bar, show estimate of remaining time for rebuild to complete

3) Monitoring / alerting features

- write important / critical events to windows event log
- archive / compress / delete old starwind logs
- windows perfomance counters for each target/device with info from 2)

4) Integration features

- storage event caputure. Basically, provide a simple command line exe that takes a string parameter, when that exe is run, string appears in a a general alert for the starwind server. This way, a user can use Windows scheduled tasks that are triggered by events recorded by their raid cards etc, and these can be bubbled up to starwind, so that administrator is aware of them. In the UI, the user can acknowledge each alert to make it go away (do this like Window Home Server does Network Health).
- a special version of above could be used for UPS alerts; on receiving the alert, Starwind could turn off WB caching on all targets
- status dump - an xml file which contains the status of everything available in the UI, updated at regular intervals (frequency set by user) for all servers being managed. This can then be queried by user's applications.

5) 2nd level Cache and dedupe

- ability to define a target as being a cache of another. So, RAM woud be first level cache, and this on disk cache (ideally a RAID 0 or 10 of SSDs) would be second level cache. Could act as Write Back or Write Through.
- block level dedupe within targets, so identical blocks are stored once and more likely to be in 1st or 2nd level cache. Personally I would prevent over provisioning of the saved space, the objective would be just to improve performance

Edit, last but not least:
6) CRC / checksum support, verification

- MS initiator supports a cyclic redundancy check on iSCSI data, but Starwind does not support this - you get an initiator error if you try to connect with it enabled. It would be nice to see this feature for really important data, as a per-target option. Intel put special instructions into the Nehalem Xeons to help speed this up, so maybe the performance impact won't be too big.
- not sure if CRC feature is enough, but another thing for really important data would be verification of every write, to detect unrecoverable read errors on the actual disks before the data is read backa again later and it's too late to do something. Without this you can get silent data corruption. This is an increasing problem with ever larger capacity drives - the bit error rate is not improving at the same pace as capacity, so there is a greater risk of you having some bad data on your disk. It's particularly a problem with 1-2TB cheap SATA drives - I even had this issue with a WD "enterprise" 2TB drive. I'm not sure if RAID (mirroring or parity) can help with this, certainly the better SAS drives have bit error rates that are orders of magnitude better, or if it's something the filesystem should be responsible for, but for really critical targets, it would be great to have starwind verify that the data that just got written to the disk is indeed the correct data, even if it kills performance. This should be optional of course!

Sat Sep 18, 2010 5:37 pm

I'll print this one, talk to R&D guys on Monday and write down a follow up.

P.S. At least some of the highlighted things are already under heavy development

)

ozuhairi · Tue Sep 21, 2010 8:52 pm

I wish StarWind adds a Fibre Channel target option to their product (similar to SCST project in Linux with QLogic HBAs). This will make the product a unified storage target, i.e. iSCSI and FC in one box in additon to Windows built-in CIFS and NFS will make this product rock

Wed Sep 22, 2010 10:07 am

We're entirely iSCSI shop so we've put all our money on the horse called "iSCSI + 10 GbE". That's why I don't think you'll see AoE or FCoE or FC target from StarWind Software...

ozuhairi wrote:I wish StarWind adds a Fibre Channel target option to their product (similar to SCST project in Linux with QLogic HBAs). This will make the product a unified storage target, i.e. iSCSI and FC in one box in additon to Windows built-in CIFS and NFS will make this product rock

DavidMcKnight · Wed Sep 29, 2010 2:55 pm

Since someone else started this thread I’ll add my wish.

I wish Starwind had some built in benchmark and diagnostic tools:

I use Starwind in a 10Gig VMware environment. I can Google all sorts of posts about how someone did “this” to get more performance or someone else did “that” which is why they lost performance. But there’s no definitive put these setting in “here” to get the best performance.

Anyway there’s a long thread in the main Starwind forum about HA performance where people are being asked to run this utilility or that to measure bandwidth.

Put these tools inside of Starwind. 90% of iSCSI is setting it up; once that’s done and done properly you walk away.

So the ability to create a test target(s) in Starwind and have Starwind run some benchmarks would be invaluable.
What performance am I getting with HW cache on or off?
What performance hit am I taking with RAID6 over RAID5?
Am I seeing better speeds with more drives in my RAID?
What’s the speed difference between an IMG file and a physical HD target?
How many iSCSI requests are being answered by the SW cache?
What’s the average latency of an iSCSI request?
What’s the average packet size of the iSCSI requests?
Etc…

Yes a number of these benchmarks could be done with third party software, but it doesn’t matter what some third party software says, I need to know what Starwind can do.

But to continue on, when using Starwind in HA mode. Having Starwind do some network benchmarks against another node would be invaluable.
What is the max network traffic my Starwind box can generate?
What is the CPU load when Starwind is under load?
What is the latency of the iSCSI query round trip?
What is the impact of TCP offloading?
What is the impact of jumbo frames?
Is my bottleneck the RAID or the Network?
Etc…

And then there is just getting better HA performance. So I can simulate some controlled iSCSI traffic and get feedback and what and where the bottleneck is on HA replication

To take this one step further… If you where to create (in my case) a VMWare virtual appliance that could act as an agent for Starwind and generate simulated traffic to benchmark and test some of the advanced cryptic setting in VMWare. A tool like that would be invaluable to squeeze every Mb out of iSCSI and Starwind.

So again my wish is for Starwind to incorporating some benchmark and diagnostic screens for those of us who want to get as much as we can from iSCSI and Starwind, plus everything that Aitor wanted too

.

Mon Dec 20, 2010 2:01 pm

OK, here we are with some plans / estimates. Please see my comments after "***" mark. BIG KISS to Alex for providing me with a required info

)

=====================================================================================================================================================

Starwind 5.5 is great, almost everything I wished 5.0 was! Here's my wishlist beyond 5.5 - hopefully at least some of these are things you are already working on!

I'm just throwing this out to see what everyone thinks; I'm sure that other people will have other things that are important to them.

I think most of these are relatively simple (still doesn't mean that they will get done quickly - there may be higher priorities) but probably the stuff in section 5) is really quite major and may need to wait to version 6 or beyond.

cheers,

Aitor

1) High Availability features

- if partner target lost completely, ability to rebuild from remaining partner without taking the target offline

*** Planned feature for V6 RELEASE, we'll probably represent it in V5.8 BETA as RFC version.

- turn a non-ha img target into an HA target - without taking the target offline

*** Post V6 plans...

- fast sync for targets using WB cache

*** This one is in V5.5 RELEASE.

- extend HA devices without having to recreate them or take them offline

*** Post V6 plans...

- use MPIO for sync channel, for more redundancy and performance

*** We'll do it in either manual MPIO or standard MPIO way in some pre-V6 version (V6 will have it for sure), probably V5.8 or so.

2) UI features

- disconnect an initiator without deleting a target

*** We'll return this feature in V5.7 (was part of 4.x branch BTW).

- rename a target without deleting it (with warning that initiators will be disconnected)

*** No exact ETA or version number yet, but in pre-V6 for sure.

- change cache policy and amount of RAM allocated without deleting target or disconnecting inititators

*** Huge plans for post Q1/2011 versions.

- default window pane sizeing - top right pane height shrinks to fit, so you see more of bottom right pane by default. Same with device tab, so devices sub pane shrinks to fit, you see more of device properties

*** Fixed already. We'll release it as part of 5.6 minor update.

- live monitoring of:
- read, write, total bandwidth, per device and per ha partner, and per initiator
- read IOPs, write IOPs, total IOPS, per device and per ha partner
- cache hit ratio (% of reads that are serviced by WT/WB cache), with a reset button
- graphical representation of cache. A coloured coded horizontal bar, showing proportions of cache that are
- written to cache, but not yet to disk (WB only)
- written to disk
- written to disk and have been read again
- written to cache, but not yet to disk, and have been read again (WB only)
- read from disk, but not yet read again
- read from disk, and red again
- unused / expired
- also expand this to show the amounts in MB & %

*** First version with monitoring will be released as 5.7, January-February 2011. We'll be adding reported values one-by-one with every post V5.7 release.

- ha resync - show % completed next to icon in left hand tree pane
- ha resync - next to bar, show estimate of remaining time for rebuild to complete

*** Planned for V5.7 update.

3) Monitoring / alerting features
- write important / critical events to windows event log

*** This one is done already. Will be part of V5.6 release. Also we'd add feature to gather report into text file and send it by e-mail.

- archive / compress / delete old starwind logs

*** Done already. Will be part of V5.7 release.

- windows perfomance counters for each target/device with info from 2)

*** Post Q1/2011 approximately.

4) Integration features

*** This one is a HUGE issue. We'll go with WMI here probably.

- storage event caputure. Basically, provide a simple command line exe that takes a string parameter, when that exe is run, string appears in a a general alert for the starwind server. This way, a user can use Windows scheduled tasks that are triggered by events recorded by their raid cards etc, and these can be bubbled up to starwind, so that administrator is aware of them. In the UI, the user can acknowledge each alert to make it go away (do this like Window Home Server does Network Health).
- a special version of above could be used for UPS alerts; on receiving the alert, Starwind could turn off WB caching on all targets
- status dump - an xml file which contains the status of everything available in the UI, updated at regular intervals (frequency set by user) for all servers being managed. This can then be queried by user's applications.

5) 2nd level Cache and dedupe
- ability to define a target as being a cache of another. So, RAM woud be first level cache, and this on disk cache (ideally a RAID 0 or 10 of SSDs) would be second level cache. Could act as Write Back or Write Through.

*** This one is not that easy... Latency kills everything. So even multi-level cache (L1 RAM and L2 SSD is not that much faster or even slower sometimes then generic L1 RAM cached I/O). So we'll go one-by-one
with L2 SSD cache first (as part of de-dupe engine) and then we'll go full storage virtualization way.

- block level dedupe within targets, so identical blocks are stored once and more likely to be in 1st or 2nd level cache. Personally I would prevent over provisioning of the saved space, the objective would be just to improve performance

*** We're working on D/D engine. So first non-public beta will be ready till the end of this year. Releases V5.7 and 5.8 will have D/D as a GOLD feature.

Edit, last but not least:
6) CRC / checksum support, verification

*** No idea how useful this stuff is.

- MS initiator supports a cyclic redundancy check on iSCSI data, but Starwind does not support this - you get an initiator error if you try to connect with it enabled. It would be nice to see this feature for really important data, as a per-target option. Intel put special instructions into the Nehalem Xeons to help speed this up, so maybe the performance impact won't be too big.
- not sure if CRC feature is enough, but another thing for really important data would be verification of every write, to detect unrecoverable read errors on the actual disks before the data is read backa again later and it's too late to do something. Without this you can get silent data corruption. This is an increasing problem with ever larger capacity drives - the bit error rate is not improving at the same pace as capacity, so there is a greater risk of you having some bad data on your disk. It's particularly a problem with 1-2TB cheap SATA drives - I even had this issue with a WD "enterprise" 2TB drive. I'm not sure if RAID (mirroring or parity) can help with this, certainly the better SAS drives have bit error rates that are orders of magnitude better, or if it's something the filesystem should be responsible for, but for really critical targets, it would be great to have starwind verify that the data that just got written to the disk is indeed the correct data, even if it kills performance. This should be optional of course!
Aitor_Ibarra

Posts: 117
Joined: Wed Nov 05, 2008 1:22 pm
Location: London

Mon Dec 20, 2010 2:04 pm

We'll do automatic test for performance for sure! You're not the 100th one who had actually asked

But thanks again for pointing!

DavidMcKnight wrote:Since someone else started this thread I’ll add my wish.

I wish Starwind had some built in benchmark and diagnostic tools:

I use Starwind in a 10Gig VMware environment. I can Google all sorts of posts about how someone did “this” to get more performance or someone else did “that” which is why they lost performance. But there’s no definitive put these setting in “here” to get the best performance.

Anyway there’s a long thread in the main Starwind forum about HA performance where people are being asked to run this utilility or that to measure bandwidth.

Put these tools inside of Starwind. 90% of iSCSI is setting it up; once that’s done and done properly you walk away.

So the ability to create a test target(s) in Starwind and have Starwind run some benchmarks would be invaluable.
What performance am I getting with HW cache on or off?
What performance hit am I taking with RAID6 over RAID5?
Am I seeing better speeds with more drives in my RAID?
What’s the speed difference between an IMG file and a physical HD target?
How many iSCSI requests are being answered by the SW cache?
What’s the average latency of an iSCSI request?
What’s the average packet size of the iSCSI requests?
Etc…

Yes a number of these benchmarks could be done with third party software, but it doesn’t matter what some third party software says, I need to know what Starwind can do.

But to continue on, when using Starwind in HA mode. Having Starwind do some network benchmarks against another node would be invaluable.
What is the max network traffic my Starwind box can generate?
What is the CPU load when Starwind is under load?
What is the latency of the iSCSI query round trip?
What is the impact of TCP offloading?
What is the impact of jumbo frames?
Is my bottleneck the RAID or the Network?
Etc…

And then there is just getting better HA performance. So I can simulate some controlled iSCSI traffic and get feedback and what and where the bottleneck is on HA replication

To take this one step further… If you where to create (in my case) a VMWare virtual appliance that could act as an agent for Starwind and generate simulated traffic to benchmark and test some of the advanced cryptic setting in VMWare. A tool like that would be invaluable to squeeze every Mb out of iSCSI and Starwind.

So again my wish is for Starwind to incorporating some benchmark and diagnostic screens for those of us who want to get as much as we can from iSCSI and Starwind, plus everything that Aitor wanted too .

sls · Thu Dec 30, 2010 3:16 am

Just installed a fresh copy of version 5.5 with HA setup. It's definitely a lot of impartments. There are a few things need to improve.

1. When create a new target and define the image file size, please allow enter the number with decimals. Currently the software only allows us enter the even number such as 2TB, 1100GB. The first time I created the new target with 2TB image. Because of the ESX only accept the LUN size with 2TB minus 512Bytes, the exact 2TB LUN just failed to recognize in ESX. You will see the target but fail to create the VMFS store with exact 2TB LUN. I tried to change it to 1.99TB but it isn’t available. I ended up enter 2037GB instead. If the software allows decimal number, it will be more flexible.
2. Unable to create VMFS when HA partner target is being sync. I created a new HA target and full sync the partner with the parent. While the partner is being synced, I can see the target in ESX but it won’t let me create the VMFS store. It always get the following error – “Call "HostDatastoreSystem.CreateVmfsDatastore" for object "datastoreSystem-28" on vCenter Server "VCENTER1" failed. Operation failed, diagnostics report: Unable to create Filesystem, please see VMkernel log for more details.” In the past version, when the HA target is being synced, the Starwind takes the target offline until it’s full synced. Version 5.5 has made the improvement to allow the parent target to serve the data while the partner target is syncing. However, this feature seems like hasn’t done completely. Please have the R&D to review it and improve it in the next release. The bottom line is HA target cannot suspend service when one of the node is either offline or being synced or being rebuilt. The primary node or the node that holds the most recent data has to be up or can be forced to up to service the data.
3. Please add the notification feature in the future release. When we setup the HA target, we need to know if there is an issue in any of the nodes. If one of the node is down due by hardware problem, because of the HA setup, the administrator may not know one of the node went down until the second node is down as well. In that case, it’s too late to recover the SAN. Adding SMTP/Email notification is helpful.
4. Allow right click on the target and choose “properties” to review the original setup parameters.
5. Allow adjusting the target original parameters without remove it and recreate it.
6. Allow taking the HA target in maintenance mode and run a quick CRC check to make sure that in both targets are consistent. I’m not sure if this is possible or how to do it. Since there is no way to verified the data consistency this time, we have to rely and assume on the software always writes data on both sides correctly. I’m sure none of any administrators like to see the 1st node went down but find out the data in the 2nd node is corrupted.

I’m looking forward to test the newer and improved release.

Thu Dec 30, 2010 7:52 am

Hi.
Thanks for your help in making StarWind better, more reliable and user friendly!
As for the sync problem, is it possible for you to zip and send us StarWind and VMkernel logs?

sls · Wed Jan 05, 2011 12:17 am

I've created the support Case # 00003201 and upload the error screenshot and the log files from ESX and Starwind server for the issue of unable to create VM store when the partner is being synced. Please look into the issue and get it fix in the newer release.

Wed Jan 05, 2011 8:16 pm

Sure! To speed up the whole process please spawn dedicated discussion thread or (better) call support by phone

Thanks!

sls wrote:I've created the support Case # 00003201 and upload the error screenshot and the log files from ESX and Starwind server for the issue of unable to create VM store when the partner is being synced. Please look into the issue and get it fix in the newer release.