Feature Request: SSD Write Smoothing

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
jeddyatcc
Posts: 49
Joined: Wed Apr 25, 2012 11:52 pm

Thu Jan 17, 2013 1:16 am

If I'm reading the following link correctly, it looks like you guys are using a method that turns the disk presented as LUNs into something like a database, basically logs are written to and then flushed to the img file.

http://www.starwindsoftware.com/log-str ... m-sdk#lsfs

If possible, I would love to be able to move the swdsk files to much higher random write speeds (PCIe SSD) and keep my RAID6 or RAID60 (7200rpm SATA/SAS) space for basically sequential I/O. This would save me a ton in overall costs, as the only way that I can see to solve the "I/O Blender" effect of VM density is to either move all storage to SSD (crazily expensive) or using higher spindle speed drives in a RAID10. While these solutions solve parts of the problem, changing the disk structure to a database the way that you have can make things a lot easier.

Side note on SSD L2 Caching:
Anyone can correct me if I'm wrong, but this will not effect dense VM installations as most of the i/o is random so no real "hot zones" exist to properly use the cache.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jan 17, 2013 10:24 am

Upcoming version with have flash caching so you'll be able to keep core structures on RAID (BTW, we'll be extremely RAID5/6 friendly as we'll update full stripe set always) and accelerate with PCIe back ended flash cache.

It will. With VDI-like scenarios these are the same blocks being used among different VMs (and we cache deduplicated data) also flash cache is not "purged" on reboots so once being loaded with data it continues to go "hot".
jeddyatcc wrote:If I'm reading the following link correctly, it looks like you guys are using a method that turns the disk presented as LUNs into something like a database, basically logs are written to and then flushed to the img file.

http://www.starwindsoftware.com/log-str ... m-sdk#lsfs

If possible, I would love to be able to move the swdsk files to much higher random write speeds (PCIe SSD) and keep my RAID6 or RAID60 (7200rpm SATA/SAS) space for basically sequential I/O. This would save me a ton in overall costs, as the only way that I can see to solve the "I/O Blender" effect of VM density is to either move all storage to SSD (crazily expensive) or using higher spindle speed drives in a RAID10. While these solutions solve parts of the problem, changing the disk structure to a database the way that you have can make things a lot easier.

Side note on SSD L2 Caching:
Anyone can correct me if I'm wrong, but this will not effect dense VM installations as most of the i/o is random so no real "hot zones" exist to properly use the cache.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jeddyatcc
Posts: 49
Joined: Wed Apr 25, 2012 11:52 pm

Thu Jan 17, 2013 12:39 pm

To make it easier to discuss, I'm going to label items.

1. I understand that I will be able to cache via SSD, but the random writes from a VDI scenario are still being performed on disks that are not optimized for random data. I agree that Starwind is very friendly towards RAID6, but would be more friendly if it were to make the random writes to other specialized drives. Most SQL admins agree that storing the log files on a RAID10 or SSD is awesome, while keeping the Size and Sequential Read speeds of RAID5/6 for the database itself. I would just like to have the same options for Starwind. A product called Virsto seems to provide this feature on the VHDx level, but I would prefer to do it on the storage level, not the virtual server.

2. Maybe I don't understand what you mean by Global Deduplication then. Is there something special that I need to do to take advantage of this? I have upgraded my production machines to the most recent release, but there does not seem to be any deduplication going on. As far as I can tell I can't create HA deduped devices.

3. Assuming I am correct about HA devices, then SSD caching will have no performance impact on HA scenarios.

BTW I think that ReFS and Storage Pools were purpose built for StarWind... They go together so well. I'm doing a bunch of performance testing this year as we are expanding our storage. My test box is Server 2012 DataCenter and so far StarWind performs awesomely using both of those new technologies.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jan 17, 2013 1:10 pm

1) We do exactly the same what Virsto does - run log-structured file system to completely eliminate random writes.

2) I did not say anything about global dedupe.

3) It will. Writes get "confirmed" after they are in cache of all clustered nodes. At some point RAM cache gets filled with data so with no flash cache slow spindles come to play.

I'm very pessimistic about ReFS which is a generic old school file system (no log structure, no heavy caches because of a single node nature) and storage spaces (we don't plan to use system built-in logical volume manager)
and don't see how the can help us. Could you please clarify on this. Thanks!
jeddyatcc wrote:To make it easier to discuss, I'm going to label items.

1. I understand that I will be able to cache via SSD, but the random writes from a VDI scenario are still being performed on disks that are not optimized for random data. I agree that Starwind is very friendly towards RAID6, but would be more friendly if it were to make the random writes to other specialized drives. Most SQL admins agree that storing the log files on a RAID10 or SSD is awesome, while keeping the Size and Sequential Read speeds of RAID5/6 for the database itself. I would just like to have the same options for Starwind. A product called Virsto seems to provide this feature on the VHDx level, but I would prefer to do it on the storage level, not the virtual server.

2. Maybe I don't understand what you mean by Global Deduplication then. Is there something special that I need to do to take advantage of this? I have upgraded my production machines to the most recent release, but there does not seem to be any deduplication going on. As far as I can tell I can't create HA deduped devices.

3. Assuming I am correct about HA devices, then SSD caching will have no performance impact on HA scenarios.

BTW I think that ReFS and Storage Pools were purpose built for StarWind... They go together so well. I'm doing a bunch of performance testing this year as we are expanding our storage. My test box is Server 2012 DataCenter and so far StarWind performs awesomely using both of those new technologies.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jeddyatcc
Posts: 49
Joined: Wed Apr 25, 2012 11:52 pm

Thu Jan 17, 2013 4:34 pm

1. Ahh so the writes are to RAM/WB cache, then with the new setup to SSD, then rolled to spinning disk if not read back for a bit.

2. I know you did not, but http://www.starwindsoftware.com/starwind-free-features talks of Global Dedupe often. I might move this into another topic to clarify deduplication from caching.

3. See 1.

ReFS - On SSD or RAMDisk it is 2-5% faster, but on spinning disk it is 2-3% slower. I think that this is a great tradeoff as I have had issues with NTFS corruption that it protects against in the past. We may have data that just sits for years, but is now accessed and used everyday... My environment may make this look better than most others because of this.

Storage Space - A significant amount of time and development has went into RAID cards to make RAID6 a very viable option(anyone using drive >2TB IMO should be using RAID6), but have neglected RAID-60. That said, when buying a 48 bay enclosure that supports only RAID6 and following suggested practices, you would be presenting 4 volumes to windows. I really like the fact that those 4 volumes become 1-2 Storage Pools and that as I expand my storage these volumes just grow instead of becoming more volumes that I must manage. And yes it looks like a respin of the dynamic disk days, but it is much better handled. Microsoft appears to recommend that you disable RAID on you storage cards, but I think that this is from a support perspective only. I am receiving RAID-60 speeds from a Simple storage space while still being able to use whatever optimizations that they have to RAID6.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jan 17, 2013 8:48 pm

1) Yes, exactly. That's the way it will work.

2) Absolutely. Makes sense.

3) We'll have own hashes protecting data from being silently corrupted so we'll not need rudimentary ReFS detection.

Storage spaces used with parity are known to be a write pigs. Google or search MS forums to find more. So we'll prefer to use own implementation.
jeddyatcc wrote:1. Ahh so the writes are to RAM/WB cache, then with the new setup to SSD, then rolled to spinning disk if not read back for a bit.

2. I know you did not, but http://www.starwindsoftware.com/starwind-free-features talks of Global Dedupe often. I might move this into another topic to clarify deduplication from caching.

3. See 1.

ReFS - On SSD or RAMDisk it is 2-5% faster, but on spinning disk it is 2-3% slower. I think that this is a great tradeoff as I have had issues with NTFS corruption that it protects against in the past. We may have data that just sits for years, but is now accessed and used everyday... My environment may make this look better than most others because of this.

Storage Space - A significant amount of time and development has went into RAID cards to make RAID6 a very viable option(anyone using drive >2TB IMO should be using RAID6), but have neglected RAID-60. That said, when buying a 48 bay enclosure that supports only RAID6 and following suggested practices, you would be presenting 4 volumes to windows. I really like the fact that those 4 volumes become 1-2 Storage Pools and that as I expand my storage these volumes just grow instead of becoming more volumes that I must manage. And yes it looks like a respin of the dynamic disk days, but it is much better handled. Microsoft appears to recommend that you disable RAID on you storage cards, but I think that this is from a support perspective only. I am receiving RAID-60 speeds from a Simple storage space while still being able to use whatever optimizations that they have to RAID6.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jeddyatcc
Posts: 49
Joined: Wed Apr 25, 2012 11:52 pm

Sat Jan 19, 2013 12:31 pm

Storage Spaces - Certainly not using parity lol, simple or mirror are the only ways that I see them being truly usable.

I'm looking at speeding up the current version of StarWind, do you think that software like Velobit will help with the current version? If not, I might still purchase Velobit to install on the Hyper-V Host servers, I mean writing to SSD and then to iSCSI has to be faster than directly writing to iSCSI any day of the week!!
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sat Jan 19, 2013 1:54 pm

We provide mirroring between hosts so with 2-way replica or 3-way replica and VM backup to DR destination there's no sense to protect storage on individual hosts.

There's no sense in Velobit or any other caching software as they do it w/o clustered environment in mind - single node caching is stone age. Providing write back caches w/o distribution among different nodes is either dangerous or ineffective. Either write-thru (read - slow and does not cache writes) or write back (read - dangerous as nobody knows what percent of say 4MB write page transaction had completed successfuly when power was turned off and UPS died). Caching controllers at least have BBU or NVRAM to ensure writes are atomic but in case of OS running caching software I don't see how to make it work w/o going inside virtualization stack.

Keeping in mind flash caching version of StarWind will be in beta soon and we'll release it gold before summer I don't see why you should waste time installing other solutions.
jeddyatcc wrote:Storage Spaces - Certainly not using parity lol, simple or mirror are the only ways that I see them being truly usable.

I'm looking at speeding up the current version of StarWind, do you think that software like Velobit will help with the current version? If not, I might still purchase Velobit to install on the Hyper-V Host servers, I mean writing to SSD and then to iSCSI has to be faster than directly writing to iSCSI any day of the week!!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jeddyatcc
Posts: 49
Joined: Wed Apr 25, 2012 11:52 pm

Sun Jan 20, 2013 2:30 pm

I agree completely, but I'm trying to find a way to streamline iSCSI traffic from the virtual hosts, as there seems to be billions of tiny tiny packets for each read and write. I'm running a ton of performance metrics this month, so I will update the thread after I have a chance to try out Velobit and any others.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun Jan 20, 2013 7:28 pm

Absolutely! So your feedback is appreciated. Either way we'll use it to improve our software. Thank you!
jeddyatcc wrote:I agree completely, but I'm trying to find a way to streamline iSCSI traffic from the virtual hosts, as there seems to be billions of tiny tiny packets for each read and write. I'm running a ton of performance metrics this month, so I will update the thread after I have a chance to try out Velobit and any others.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jeddyatcc
Posts: 49
Joined: Wed Apr 25, 2012 11:52 pm

Fri Jan 25, 2013 8:11 pm

Velobit and FancyCache work beautifully, but not for Active-Active clustered environments. Until the new release of StarWind, they do provide the ability for L2 SSD caching. I'm back to the drawing board on how to accelerate my hyper v cluster.
User avatar
anton (staff)
Site Admin
Posts: 4010
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jan 25, 2013 8:27 pm

I don't see block cache and write-back flash cache on a single controller setup. It should be very flexed environment.
jeddyatcc wrote:Velobit and FancyCache work beautifully, but not for Active-Active clustered environments. Until the new release of StarWind, they do provide the ability for L2 SSD caching. I'm back to the drawing board on how to accelerate my hyper v cluster.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply