ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Kasnacheev <ilya.kasnach...@gmail.com>
Subject Re: Compression prototype
Date Mon, 03 Sep 2018 17:36:29 GMT
Hello again!

I've been running various compression parameters through cod dataset.

It looks like the best compression level in terms of speed is either 1 or 2.
The default for Zstd seems to be 3 which would almost always perform worse.
For best performance a dictionary of 1024 is optimal, for better compression
one might choose larger dictionaries, 6k looks good but I will also run a
few benchmarks on larger dicts. Unfortunately, Zstd crashes if sample size
is set to more than 16k entries (I guess I should probe the max buffer size
where problems begin).

I'm attaching two charts which show what's we've got. Compression rate is a
fraction of original records size. Time to run is wall clock time the test
run. Reasonable compression will increase the run time twofold (of a program
that only does text record parsing -> creates objects -> binarylizes them ->
compresses -> decompresses). Notation: s{number of bin objects used to
train}-d{dictionary length in bytes}-l{compression level}.
Second one is basically a zoom in on the first.
I think that in additional to dictionary compression we should have
dictionary-less compression. On typical data of small records it shows
compression rate of 0.8 ~ 0.65, but I can imagine that with larger
unstructured records it can be as good as dict-based and much less of a
hassle dictionary-processing-wise. WDYT?
Sorry for the fine prints. I hope my charts will visible.

You can see the updated code as pull request:


Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

View raw message