ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Custom string encoding
Date Sun, 02 Jul 2017 16:50:20 GMT
There is no need for custom encoders, as they are already built-in to Java.

вс, 2 июля 2017 г. в 19:16, Dmitriy Setrakyan <dsetrakyan@apache.org>:

> Vladimir, how would you plugin custom encoders in your design?
>
> On Sat, Jul 1, 2017 at 11:53 PM, Vladimir Ozerov <vozerov@gridgain.com>
> wrote:
>
> > Valya,
> >
> > Personally I vote against this feature. BinaryConfiguration is proven to
> be
> > inconvenient, since it has to be configured before node start, it cannot
> be
> > changed in runtime, and it requires classes on the server. Moreover, if
> you
> > decide to change encoding at some point, it would be impossible.
> >
> > I think, we should add this feature on API level instead. If string is
> > written in non-UTF8 form, we will write in different format:
> > [encoding_code][string]
> >
> > BInaryWriter.writeString(String fieldName, String val);
> > BInaryWriter.writeString(String fieldName, String val, *String
> encoding*);
> >
> > BinaryReader.readString(String fieldName);
> > BinaryReader.readString(String fieldName, *String encoding*);
> >
> > BinaryObjectBuilder.writeString(String fieldName, String val, *String
> > encoding*);
> >
> > class MyClass {
> >     *@BinaryString(encoding = "Cp1251")*
> >     private String myCyrillicString;
> > }
> >
> > Vladimir.
> >
> > On Sat, Jul 1, 2017 at 7:26 PM, Dmitriy Setrakyan <dsetrakyan@apache.org
> >
> > wrote:
> >
> > > On Sat, Jul 1, 2017 at 2:24 AM, Sergi Vladykin <
> sergi.vladykin@gmail.com
> > >
> > > wrote:
> > >
> > > > In SQL indexes we may store partial strings and assume them to be in
> > > UTF-8,
> > > > I don't think this can be abstracted away. But may be this is not a
> big
> > > > deal if in indexes we still will use UTF-8.
> > > >
> > >
> > > Sergi, why does it matter if it is UTF8 or custom encoding? Why can't
> we
> > > use our own compact encoding in indexes?
> > >
> > >
> > > >
> > > > 2017-07-01 10:13 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org
> >:
> > > >
> > > > > Val, do you know how we compare strings in SQL queries? Will we be
> > able
> > > > to
> > > > > use this encoder?
> > > > >
> > > > > Additionally, I think that the encoder is a bit too abstract. Why
> not
> > > go
> > > > > even further and allow users create their own ASCII table for
> > encoding?
> > > > >
> > > > > D.
> > > > >
> > > > > On Fri, Jun 30, 2017 at 6:49 PM, Valentin Kulichenko <
> > > > > valentin.kulichenko@gmail.com> wrote:
> > > > >
> > > > > > Andrey,
> > > > > >
> > > > > > Can you elaborate more on this? What is your concern?
> > > > > >
> > > > > > -Val
> > > > > >
> > > > > > On Fri, Jun 30, 2017 at 6:17 PM Andrey Mashenkov <
> > > > > > andrey.mashenkov@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Val,
> > > > > > >
> > > > > > > Looks like make sense.
> > > > > > >
> > > > > > > This will not affect FullText index, as Lucene has own
format
> for
> > > > > storing
> > > > > > > data.
> > > > > > >
> > > > > > > But.. would it be compatible with H2 indexing ? I doubt.
> > > > > > >
> > > > > > > 1 июля 2017 г. 2:27 пользователь "Valentin
Kulichenko" <
> > > > > > > valentin.kulichenko@gmail.com> написал:
> > > > > > >
> > > > > > > > Folks,
> > > > > > > >
> > > > > > > > Currently binary marshaller always encodes strings
in UTF-8.
> > > > However,
> > > > > > > > sometimes it can be useful to customize this. For
example, if
> > > data
> > > > > > > contains
> > > > > > > > a lot of Cyrillic, Chinese or other symbols, but not
so many
> > > Latin
> > > > > > > symbols,
> > > > > > > > memory is used very inefficiently. In this case it
would be
> > great
> > > > to
> > > > > > > encode
> > > > > > > > most frequently used symbols in one byte instead of
two or
> > three.
> > > > > > > >
> > > > > > > > I propose to introduce BinaryStringEncoder interface
that
> will
> > > > > convert
> > > > > > > > strings to byte arrays and back, and make it pluggable
via
> > > > > > > > BinaryConfiguration. This will allow users to plug
in any
> > > encoding
> > > > > > > > algorithms based on their requirements.
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > https://issues.apache.org/jira/browse/IGNITE-5655
> > > > > > > >
> > > > > > > > -Val
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message