ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Daradur <daradu...@gmail.com>
Subject Re: Compression prototype
Date Mon, 27 Aug 2018 08:34:40 GMT
Hi Igniters!

Ilya, I'm glad to see one more person who is interested in the
compression feature in Ignite.

I looked through the pull request and want to share following thoughts:

It's very dangerous using a custom algorithm in this way - you store
serialized data separate from a dictionary and there are a lot of
points when we may lose data: rebalancing, serialization errors, node
rebooting and so on.

I'd suggest the following ways to improve reliability:
- use well know algorithms: zstd, deflater, lzma, gzip e.g. that
allows us to decompress data in any situation
- store the dictionary inside page with data

Also, we have a lot of discussions [1] [2] about compression on
BinaryObject and BinaryMarshaller level and Vladimir Ozerov was
strictly against a compression on this level.
If something has changed since then, you may look through [1] [2] [3]
I've done a lot of research in algorithms comparison it may be useful
for you.

[1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-2-0-td10099.html
[2] http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-td20679.html
[3] https://issues.apache.org/jira/browse/IGNITE-3592
[4] https://issues.apache.org/jira/browse/IGNITE-5226
[5] https://github.com/daradurvs/ignite-compression
On Sat, Aug 25, 2018 at 2:51 AM Denis Magda <dmagda@apache.org> wrote:
>
> >
> > Currently, the dictionary for decompression is only stored on heap. After
> > restart there's compressed data in the PDS, but there's no dictionary :)
>
>
> Basically, it means that I've lost my data, right? How about persisting
> data to disk.
>
> Overall, we need Vladimir Ozerov to check the contribution. He was the one
> who sponsored the IEP and knows the area best.
>
> --
> Denis
>
> On Fri, Aug 24, 2018 at 4:31 AM Ilya Kasnacheev <ilya.kasnacheev@gmail.com>
> wrote:
>
> > Hello!
> >
> > It is somewhat a part of IEP-20, since I have updated it with this
> > particular direction.
> >
> > Regards,
> >
> > --
> > Ilya Kasnacheev
> >
> > 2018-08-24 2:56 GMT+03:00 Denis Magda <dmagda@apache.org>:
> >
> > > Hi Ilya,
> > >
> > > Sounds terrific! Is this part of the following Ignite enhancement
> > proposal?
> > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > 20%3A+Data+Compression+in+Ignite
> > >
> > > --
> > > Denis
> > >
> > > On Thu, Aug 23, 2018 at 5:17 AM Ilya Kasnacheev <
> > ilya.kasnacheev@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hello!
> > > >
> > > > My plan was to add a compression section to cache configuration, where
> > > you
> > > > can enable compression, enable key compression (which has heavier
> > > > performance implications), adjust dictionary gathering settings, and in
> > > the
> > > > future possibly choose betwen algorithms. In fact I'm not sure, since
> > my
> > > > assumption is that you can always just use latest&greatest, but maybe
> > we
> > > > can have e.g. very fast and not very strong vs. slower but stronger
> > one.
> > > >
> > > > I'm not sure yet if we should share dictionary between all caches vs.
> > > > having separate dictionary for every cache.
> > > >
> > > > With regards to data format, of course there will be room for further
> > > > extension.
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > > 2018-08-23 15:13 GMT+03:00 Sergey Kozlov <skozlov@gridgain.com>:
> > > >
> > > > > Hi Ilya
> > > > >
> > > > > Is there a plan to introduce it as an option of Ignite configuration?
> > > In
> > > > > that instead the boolean type I suggest to use the enum and reserve
> > the
> > > > > ability to extend compressions algorithms in future
> > > > >
> > > > > On Thu, Aug 23, 2018 at 1:09 PM, Ilya Kasnacheev <
> > > > > ilya.kasnacheev@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello!
> > > > > >
> > > > > > I want to share with the developer community my compression
> > > prototype.
> > > > > >
> > > > > > Long story short, it compresses BinaryObject's byte[] as they
are
> > > > written
> > > > > > to Durable Memory page, operating on a pre-built dictionary.
> > Typical
> > > > > > compression ratio is 0.4 (meaning 2.5x compression) using custom
> > > > > > LZW+Huffman. Metadata, indexes and primitive values are unaffected
> > > > > > entirely.
> > > > > >
> > > > > > This is akin to DB2's table-level compression[1] but independently
> > > > > > invented.
> > > > > >
> > > > > > On Yardstick tests performance hit is -6% with PDS and up to
-25%
> > (in
> > > > > > throughput) with In-Memory loads. It also means you can fit
~twice
> > as
> > > > > much
> > > > > > data into the same IM cluster, or have higher ram/disk ratio
with
> > PDS
> > > > > > cluster, saving on hardware or decreasing latency.
> > > > > >
> > > > > > The code is available as PR 4295[2] (set
> > > IGNITE_ENABLE_COMPRESSION=true
> > > > > to
> > > > > > activate). Note that it will not presently survive a PDS node
> > > restart.
> > > > > > The impact is very small, the patch should be applicable to
most
> > 2.x
> > > > > > releases.
> > > > > >
> > > > > > Sure there's a long way before this prototype can have hope
of
> > being
> > > > > > included, but first I would like to hear input from fellow
> > igniters.
> > > > > >
> > > > > > See also IEP-20[3].
> > > > > >
> > > > > > 1.
> > > > > > https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.
> > > > > > 5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0052331.html
> > > > > > 2. https://github.com/apache/ignite/pull/4295
> > > > > > 3.
> > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > > > 20%3A+Data+Compression+in+Ignite
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > --
> > > > > > Ilya Kasnacheev
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sergey Kozlov
> > > > > GridGain Systems
> > > > > www.gridgain.com
> > > > >
> > > >
> > >
> >



-- 
Best Regards, Vyacheslav D.

Mime
View raw message