ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valentin Kulichenko <valentin.kuliche...@gmail.com>
Subject Re: Data compression in Ignite 2.0
Date Wed, 07 Jun 2017 23:04:39 GMT
Vyacheslav, Anton,

Are there any ideas and/or prototypes for the API? Your design suggestions
seem to make sense, but I would like to see how it all this will like from
user's standpoint.

-Val

On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев <churaev.an@gmail.com> wrote:

> Vyacheslav, correct me if something wrong
>
> We could provide opportunity of choose between CPU usage and MEM/NET usage
> for users by compression some attributes of stored objects.
> You have learned design, and it is possible to localize changes in
> marshalling without performance affect and current functionality.
>
> I think, that it's usefull for our project and users.
> Community, what do you think about this proposal?
>
>
> 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com>:
>
> > In short,
> >
> > During marshalling a fields is represented as BinaryFieldAccessor which
> > manages its marshalling. It checks if the field is marked by annotation
> > @BinaryCompression, in that case - binary  representation of field (bytes
> > array) will be compressed. It will be marked as compressed by types
> > constant (GridBinaryMarshaller.COMPRESSED), after this the compressed
> > bytes
> > array wiil be include in binary representation of whole object. Note,
> > header of marshalled object will not be compressed. Compression affected
> > only object's field representation.
> >
> > Objects in IgniteCache is represented as BinaryObject which is wrapper
> over
> > bytes array of marshalled object.
> > BinaryObject provides some usefull methods, which are used by Ignite
> > systems.
> > For example, the Queries use BinaryObject#field method, which
> deserializes
> > only field of object, without deserializing of whole object.
> > BinaryObject#field method during deserialization, if meets the constant
> of
> > compressed type, decompress this bytes array, then continue unmarshalling
> > as usual.
> >
> > Now, I introduced the Compressor interface in IgniteConfigurations, it
> > allows user to use own implementation of compressor - it is the
> requirement
> > in the task[1].
> >
> > As far as I know, Vladimir Ozerov doesn't like the idea of granting this
> > opportunity to the user.
> > In that case we can choose a compression algorithm which we will provide
> by
> > default and will move the interface to internals of binary
> infractructure.
> > For this case I've prepared benchmarked, which I've sent earlier.
> >
> > I vote for ZSTD algorithm[2], it provides good compression ratio and good
> > throughput. It has implementation in Java, .NET and C++, and has
> > ASF-friendly license, we can use it in the all Ignite platforms.
> > You can look at an assessment of this algorithm in my benchmark's
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-3592
> > [2]https://github.com/facebook/zstd
> >
> >
> > 2017-06-06 16:02 GMT+03:00 Антон Чураев <churaev.an@gmail.com>:
> >
> > > Looks good for me.
> > >
> > > Could You propose design of implementation in couple of sentences?
> > > So that we can estimate the completeness and complexity of the
> proposal.
> > >
> > > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com>:
> > >
> > > > Anton,
> > > >
> > > > Of course, the solution does not affect on existing implementation. I
> > > mean,
> > > > there is no changes if user not use the annotation
> @BinaryCompression.
> > > (no
> > > > performance changes)
> > > > Only if user make decision to use compression on specific field or
> > fields
> > > > of a class - in that case compression will be used at marshalling in
> > > > relation to annotated fields.
> > > >
> > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <churaev.an@gmail.com>:
> > > >
> > > > > Vyacheslav,
> > > > >
> > > > > Is it possible to propose implementation that can be switched on
> > > > on-demand?
> > > > > In this case it should not affect performance of current solution.
> > > > >
> > > > > I mean, that users should make decision what is more important for
> > > them:
> > > > > throutput or memory/net usage.
> > > > > May be they will be choose not all objects, or only some attributes
> > of
> > > > > objects for compress.
> > > > >
> > > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com
> >:
> > > > >
> > > > > > Conclusion:
> > > > > > Provided solution allows reduce size of an object in IgniteCache
> at
> > > the
> > > > > > cost of throughput reduction (small - in some cases), it depends
> on
> > > > part
> > > > > of
> > > > > > object which will be compressed and compression algorithm.
> > > > > > I mean, we can make more effective use of memory, and in some
> cases
> > > it
> > > > > can
> > > > > > reduce loading of the interconnect. (replication, rebalancing)
> > > > > >
> > > > > > Especially, it will be particularly useful for object's fields
> > which
> > > > are
> > > > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > > > >
> > > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <churaev.an@gmail.com>:
> > > > > >
> > > > > > > Vyacheslav, thank you! But could you please provide a
> conclusions
> > > or
> > > > > > > proposals based on this benchmarks?
> > > > > > >
> > > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <
> > daradurvs@gmail.com
> > > >:
> > > > > > >
> > > > > > > > Dmitry,
> > > > > > > >
> > > > > > > > Excel-pages:
> > > > > > > >
> > > > > > > > 1). "Compression ratio (2)" - shows object size, with
> > compression
> > > > and
> > > > > > > > without compression. (Conditions: literal text)
> > > > > > > > 1st graph shows compression ratios of using different
> > compression
> > > > > > > algrithms
> > > > > > > > depending on size of compressed field.
> > > > > > > > 2nd graph shows evaluation of size of objects depending
on
> > sizes
> > > > and
> > > > > > > > compression algorithms.
> > > > > > > >
> > > > > > > > 2). "Compression ratio (1)" - shows object size, with
> > compression
> > > > and
> > > > > > > > without compression. (Conditions:  badly compressed
character
> > > > > sequence)
> > > > > > > > 1st graph shows compression ratios of using different
> > compression
> > > > > > > > algrithms depending on size of compressed field.
> > > > > > > > 2nd graph shows evaluation of size of objects depending
on
> > sizes
> > > > and
> > > > > > > > compression algorithms.
> > > > > > > >
> > > > > > > > 3) 'put-avg" - shows average time of the "put" operation
> > > depending
> > > > on
> > > > > > > size
> > > > > > > > and compression algorithms.
> > > > > > > >
> > > > > > > > 4) 'put-thrpt" - shows throughput of the "put" operation
> > > depending
> > > > on
> > > > > > > size
> > > > > > > > and compression algorithms.
> > > > > > > >
> > > > > > > > 5) 'get-avg" - shows average time of the "get" operation
> > > depending
> > > > on
> > > > > > > size
> > > > > > > > and compression algorithms.
> > > > > > > >
> > > > > > > > 6) 'get-thrpt" - shows throughput of the "get" operation
> > > depending
> > > > on
> > > > > > > size
> > > > > > > > and compression algorithms.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > > > dsetrakyan@apache.org
> > > > > >:
> > > > > > > >
> > > > > > > > > Vladimir, I am not sure how to interpret the
graphs? What
> are
> > > we
> > > > > > > looking
> > > > > > > > > at?
> > > > > > > > >
> > > > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur
<
> > > > > > > daradurvs@gmail.com
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Igniters.
> > > > > > > > > >
> > > > > > > > > > I've prepared some benchmarking. Results
[1].
> > > > > > > > > >
> > > > > > > > > > And I've prepared the evaluation in the
form of diagrams
> > [2].
> > > > > > > > > >
> > > > > > > > > > I hope that helps to interest the community
and
> > accelerates a
> > > > > > > reaction
> > > > > > > > to
> > > > > > > > > > this improvment :)
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > > > > > master/src/main/resources/result
> > > > > > > > > > [2] https://drive.google.com/file/d/
> > > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > > > view
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur
<
> > > > > daradurvs@gmail.com
> > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Guys, any thoughts?
> > > > > > > > > > >
> > > > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav
Daradur <
> > > > > > daradurvs@gmail.com
> > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > >> Hi guys,
> > > > > > > > > > >>
> > > > > > > > > > >> I've prepared the PR to show my
idea.
> > > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > > > > >>
> > > > > > > > > > >> About querying - I've just copied
existing tests and
> > have
> > > > > > > annotated
> > > > > > > > > the
> > > > > > > > > > >> testing data.
> > > > > > > > > > >> https://github.com/apache/
> ignite/pull/1951/files#diff-
> > > > c19a9d
> > > > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > > > >>
> > > > > > > > > > >> It means fields which will be marked
by
> > @BinaryCompression
> > > > > will
> > > > > > be
> > > > > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > > > > >>
> > > > > > > > > > >> This solution has no effect on
existing data or
> project
> > > > > > > > architecture.
> > > > > > > > > > >>
> > > > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav
Daradur <
> > > > > > > daradurvs@gmail.com
> > > > > > > > >:
> > > > > > > > > > >>
> > > > > > > > > > >>> Dmitriy,
> > > > > > > > > > >>>
> > > > > > > > > > >>> I have ready prototype. I want
to show it.
> > > > > > > > > > >>> It is always easier to discuss
on example.
> > > > > > > > > > >>>
> > > > > > > > > > >>> 2017-05-15 19:02 GMT+03:00
Dmitriy Setrakyan <
> > > > > > > > dsetrakyan@apache.org
> > > > > > > > > >:
> > > > > > > > > > >>>
> > > > > > > > > > >>>> Vyacheslav,
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> I think it is a bit premature
to provide a PR
> without
> > > > > getting
> > > > > > a
> > > > > > > > > > >>>> community
> > > > > > > > > > >>>> consensus on the dev list.
Please allow some time
> for
> > > the
> > > > > > > > community
> > > > > > > > > to
> > > > > > > > > > >>>> respond.
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> D.
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> On Mon, May 15, 2017 at
6:36 AM, Vyacheslav Daradur
> <
> > > > > > > > > > >>>> daradurvs@gmail.com>
> > > > > > > > > > >>>> wrote:
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> > I created the ticket:
> > https://issues.apache.org/jira
> > > > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > I'll prepare a PR
with described solution in
> couple
> > of
> > > > > days.
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00
Vyacheslav Daradur <
> > > > > > > > > daradurvs@gmail.com
> > > > > > > > > > >:
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > Apache 2.0 is
released.
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > Let's continue
the discussion about a
> compression
> > > > > design.
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > At the moment,
I found only one solution which
> is
> > > > > > compatible
> > > > > > > > > with
> > > > > > > > > > >>>> > querying
> > > > > > > > > > >>>> > > and indexing,
this is per-objects-field
> > compression.
> > > > > > > > > > >>>> > > Per-fields compression
means that metadata (a
> > > header)
> > > > of
> > > > > > an
> > > > > > > > > object
> > > > > > > > > > >>>> won't
> > > > > > > > > > >>>> > > be compressed,
only serialized values of an
> object
> > > > > fields
> > > > > > > (in
> > > > > > > > > > bytes
> > > > > > > > > > >>>> array
> > > > > > > > > > >>>> > > form) will be
compressed.
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > This solution
have some contentious issues:
> > > > > > > > > > >>>> > > - small values,
like primitives and short
> arrays -
> > > > there
> > > > > > > isn't
> > > > > > > > > > >>>> sense to
> > > > > > > > > > >>>> > > compress them;
> > > > > > > > > > >>>> > > - there is no
possible to use compression with
> > > > > > > java-predefined
> > > > > > > > > > >>>> types;
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > We can provide
an annotation,
> @IgniteCompression -
> > > for
> > > > > > > > example,
> > > > > > > > > > >>>> which can
> > > > > > > > > > >>>> > > be used by users
for marking fields to compress.
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > Any thoughts?
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > Maybe someone
already have ready design?
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > 2017-04-10 11:06
GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > > daradurvs@gmail.com
> > > > > > > > > > >>>> >:
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > >> Alexey,
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> Yes, I've
read it.
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> Ok, let's
discuss about public API design.
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> I think we
need to add some a configure entity
> to
> > > > > > > > > > >>>> CacheConfiguration,
> > > > > > > > > > >>>> > >> which will
contain the Compressor interface
> > > > > > implementation
> > > > > > > > and
> > > > > > > > > > some
> > > > > > > > > > >>>> > usefull
> > > > > > > > > > >>>> > >> parameters.
> > > > > > > > > > >>>> > >> Or maybe
to provide a BinaryMarshaller
> decorator,
> > > > which
> > > > > > > will
> > > > > > > > be
> > > > > > > > > > >>>> compress
> > > > > > > > > > >>>> > >> data after
marshalling.
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> 2017-04-10
10:40 GMT+03:00 Alexey Kuznetsov <
> > > > > > > > > > akuznetsov@apache.org
> > > > > > > > > > >>>> >:
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> Did you
read initial discussion [1] about
> > > > compression?
> > > > > > > > > > >>>> > >>> As far
as I remember we agreed to add only
> some
> > > > > > > "top-level"
> > > > > > > > > API
> > > > > > > > > > in
> > > > > > > > > > >>>> > order
> > > > > > > > > > >>>> > >>> to
> > > > > > > > > > >>>> > >>> provide
a way for
> > > > > > > > > > >>>> > >>> Ignite
users to inject some sort of custom
> > > > > compression.
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> [1]
> > > > > > > > > > >>>> > >>> http://apache-ignite-developer
> > s.2346864.n4.nabble
> > > .
> > > > > > > > com/Data-c
> > > > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> On Mon,
Apr 10, 2017 at 2:19 PM, daradurvs <
> > > > > > > > > daradurvs@gmail.com
> > > > > > > > > > >
> > > > > > > > > > >>>> > wrote:
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> >
Hi Igniters!
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
I am interested in this task.
> > > > > > > > > > >>>> > >>> >
Provide some kind of pluggable compression
> SPI
> > > > > support
> > > > > > > > > > >>>> > >>> >
<https://issues.apache.org/
> > > > jira/browse/IGNITE-3592>
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
I developed a solution on
> > > BinaryMarshaller-level,
> > > > > but
> > > > > > > > > reviewer
> > > > > > > > > > >>>> has
> > > > > > > > > > >>>> > >>> rejected
> > > > > > > > > > >>>> > >>> >
it.
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
Let's continue discussion of task goals and
> > > > solution
> > > > > > > > design.
> > > > > > > > > > >>>> > >>> >
As I understood that, the main goal of this
> > task
> > > > is
> > > > > to
> > > > > > > > store
> > > > > > > > > > >>>> data in
> > > > > > > > > > >>>> > >>> >
compressed form.
> > > > > > > > > > >>>> > >>> >
This is what I need from Ignite as its user.
> > > > > > Compression
> > > > > > > > > > >>>> provides
> > > > > > > > > > >>>> > >>> economy
> > > > > > > > > > >>>> > >>> >
on
> > > > > > > > > > >>>> > >>> >
servers.
> > > > > > > > > > >>>> > >>> >
We can store more data on same servers at
> the
> > > cost
> > > > > of
> > > > > > > > > > >>>> increasing CPU
> > > > > > > > > > >>>> > >>> >
utilization.
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
I'm researching a possibility of
> > implementation
> > > of
> > > > > > > > > compression
> > > > > > > > > > >>>> at the
> > > > > > > > > > >>>> > >>> >
cache-level.
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
Any thoughts?
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
--
> > > > > > > > > > >>>> > >>> >
Best regards,
> > > > > > > > > > >>>> > >>> >
Vyacheslav
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
--
> > > > > > > > > > >>>> > >>> >
View this message in context:
> > > > http://apache-ignite-
> > > > > > > > > > >>>> > >>> >
developers.2346864.n4.nabble.
> > > > > com/Data-compression-in-
> > > > > > > > > > >>>> > >>> >
Ignite-2-0-tp10099p16317.html
> > > > > > > > > > >>>> > >>> >
Sent from the Apache Ignite Developers
> mailing
> > > > list
> > > > > > > > archive
> > > > > > > > > at
> > > > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> --
> > > > > > > > > > >>>> > >>> Alexey
Kuznetsov
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> --
> > > > > > > > > > >>>> > >> Best Regards,
Vyacheslav
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > --
> > > > > > > > > > >>>> > > Best Regards,
Vyacheslav
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > --
> > > > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>> --
> > > > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best Regards, Anton Churaev
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards, Anton Churaev
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best Regards, Anton Churaev
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>
>
>
> --
>
> Best Regards, Anton Churaev
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message