ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: Compression prototype
Date Mon, 03 Sep 2018 23:40:10 GMT
Hi Ilya,

This is very useful. Is the compression going to be per-page, in which case
the dictionary is going to be kept inside of a page? Or do you have some
other design in mind?

D.

On Mon, Sep 3, 2018 at 10:36 AM, Ilya Kasnacheev <ilya.kasnacheev@gmail.com>
wrote:

> Hello again!
>
> I've been running various compression parameters through cod dataset.
>
> It looks like the best compression level in terms of speed is either 1 or
> 2.
> The default for Zstd seems to be 3 which would almost always perform worse.
> For best performance a dictionary of 1024 is optimal, for better
> compression
> one might choose larger dictionaries, 6k looks good but I will also run a
> few benchmarks on larger dicts. Unfortunately, Zstd crashes if sample size
> is set to more than 16k entries (I guess I should probe the max buffer size
> where problems begin).
>
> I'm attaching two charts which show what's we've got. Compression rate is a
> fraction of original records size. Time to run is wall clock time the test
> run. Reasonable compression will increase the run time twofold (of a
> program
> that only does text record parsing -> creates objects -> binarylizes them
> ->
> compresses -> decompresses). Notation: s{number of bin objects used to
> train}-d{dictionary length in bytes}-l{compression level}.
> <http://apache-ignite-developers.2346864.n4.nabble.
> com/file/t374/chart1.png>
> Second one is basically a zoom in on the first.
> <http://apache-ignite-developers.2346864.n4.nabble.
> com/file/t374/chart2.png>
> I think that in additional to dictionary compression we should have
> dictionary-less compression. On typical data of small records it shows
> compression rate of 0.8 ~ 0.65, but I can imagine that with larger
> unstructured records it can be as good as dict-based and much less of a
> hassle dictionary-processing-wise. WDYT?
> Sorry for the fine prints. I hope my charts will visible.
>
> You can see the updated code as pull request:
> https://github.com/apache/ignite/pull/4673
>
> Regards,
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message