lz4 Extremely Fast Compression algorithm

Public beta (bugs, reports, suggestions, features and requests)

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
stenrulz
Posts: 14
Joined: Tue Dec 03, 2013 2:01 am

Tue Dec 03, 2013 2:13 am

Hello,

Is there any reason why there is no compression support for example LZ4?

From my understanding LZ4 would improve performance and also allow more data to be written to small drives such as SSDs.
http://fastcompression.blogspot.com.au/ ... hmark.html
https://code.google.com/p/lz4/
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Dec 03, 2013 9:49 am

We've started with in-line deduplication using very small blocks (4KB, compare to Microsoft using 32KB as the smallest) so it gives very good saving ratios.

We'll add compression-over-deduplication in post V8. That's already on the schedule. Thanks for pointing :)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
stenrulz
Posts: 14
Joined: Tue Dec 03, 2013 2:01 am

Tue Dec 03, 2013 10:15 am

anton (staff) wrote:We've started with in-line deduplication using very small blocks (4KB, compare to Microsoft using 32KB as the smallest) so it gives very good saving ratios.

We'll add compression-over-deduplication in post V8. That's already on the schedule. Thanks for pointing :)
I would really like to see this in V8 release it is such a big deal when using SSDs, etc. For example currently 20x Window Server VMs installed with a few things added such as domain controllers using Beta 2 in-line deduplication. Raw Data is 302GB, Storage1_1.spspx is 102GB due to deduplication and only 26GB with compression-over-deduplication.

*** LZ4 for Windows 32-bits v1.4, by Yann Collet (Sep 17 2013) ***
Compressed filename will be : Storage1_1.spspx.lz4
Read104856 MB (25.65%)
Compressed 109949485056 bytes into 28204578273 bytes ==> 25.65%
Press enter to continue...
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Dec 03, 2013 10:26 am

What deduplication block size do you use?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
stenrulz
Posts: 14
Joined: Tue Dec 03, 2013 2:01 am

Tue Dec 03, 2013 10:37 am

4kb, you can try the LZ4 out your self with your own .spspx. I used High compression level as LZ4 is really fast. http://fastcompression.blogspot.fr/p/lz4.html

As well i have PM you a link to how Compression and deduplication ratios work great for some other SAN systems out there.
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Dec 03, 2013 10:50 am

Can you post a StarWind stats page so we could be sure?

I know... Ideally compression and deduplication should be combined. Some companies (Nimble) do compression with dedupe on the roadmap, some (StarWind) do dedupe to add
compression later, some (Pure) do both. But for ~200M of a VC money Pure got from CIA and others I would build a communism for Linuxes (crossed out...) penguins in Antarctica ))

Thanks for pointing either way we'll see what could be done here ASAP ))
stenrulz wrote:4kb, you can try the LZ4 out your self with your own .spspx. I used High compression level as LZ4 is really fast. http://fastcompression.blogspot.fr/p/lz4.html

As well i have PM you a link to how Compression and deduplication ratios work great for some other SAN systems out there.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
stenrulz
Posts: 14
Joined: Tue Dec 03, 2013 2:01 am

Tue Dec 03, 2013 11:24 am

anton (staff) wrote:Can you post a StarWind stats page so we could be sure?

I know... Ideally compression and deduplication should be combined. Some companies (Nimble) do compression with dedupe on the roadmap, some (StarWind) do dedupe to add
compression later, some (Pure) do both. But for ~200M of a VC money Pure got from CIA and others I would build a communism for Linuxes (crossed out...) penguins in Antarctica ))

Thanks for pointing either way we'll see what could be done here ASAP ))
stenrulz wrote:4kb, you can try the LZ4 out your self with your own .spspx. I used High compression level as LZ4 is really fast. http://fastcompression.blogspot.fr/p/lz4.html

As well i have PM you a link to how Compression and deduplication ratios work great for some other SAN systems out there.
Thanks look forward to the compression.

Also, just notice on Windows 2012 R2 creating a new storage even selecting the 4kb it uses 4096 block size but will create a new post about the bug once i got all the info.
stenrulz
Posts: 14
Joined: Tue Dec 03, 2013 2:01 am

Tue Dec 03, 2013 1:12 pm

There is no bug, please see the below screenshots.
storage1.png
storage1.png (33.73 KiB) Viewed 9780 times
storage1-1.png
storage1-1.png (20.92 KiB) Viewed 9781 times
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Dec 03, 2013 5:00 pm

1) There's one point you miss (good Alex had pointed). When you compress huge file you use shared vocabilary and basically compress whole file.
To read some part of it you need to decompress it completely and it a) takes time and b) takes insane amount of memory. That's why file systems
break file into smaller parts and compress them alone. What I'm trying to tell - no compression-enabled system will keep ratios you show for a block
storage w/o slowing down things completely. So we'll add compression to dedupe but it's an optimization thing and still you're not going to save
huge amounts of space.

(If I miss something obvious please let me know)

2) 4KB = 4096 bytes.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
stenrulz
Posts: 14
Joined: Tue Dec 03, 2013 2:01 am

Wed Dec 04, 2013 12:31 am

1) I think there are methods around this but do not remember the name of it. If i open 7zip, select a 100GB archive and try to extract 2mb file it does not have to read the whole archive.
User avatar
anton (staff)
Site Admin
Posts: 4008
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed Dec 04, 2013 8:38 am

It does quite a lot of reads though. In general archivers does not need to provide a random reads and (biggest issue) random writes. We have to...

Yes, we'll be adding compression to StarWind deduplication. No, it would not be *that* effective as just compressing the whole set of many VMs in an archiver like a single file.
stenrulz wrote:1) I think there are methods around this but do not remember the name of it. If i open 7zip, select a 100GB archive and try to extract 2mb file it does not have to read the whole archive.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply