hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Storing JSON in HBase value cell, which serialization format is most compact?
Date Thu, 13 Nov 2014 15:15:14 GMT
Keep in mind that Prefix Tree encoding has higher overhead in write path
compared to other data block encoding methods.

Please use 0.98.7 which has the latest fixes for Prefix Tree encoding.

Cheers

On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Thanks Ram,
>
> How about Prefix Tree based encoding then? HBASE-4676
> <https://issues.apache.org/jira/browse/HBASE-4676> says it's also possible
> to do suffix tries? Then it could be a nice fit for JSON String (or any
> long value where changes are small).
>
> Maybe I should just flatten JSON to columns, hmm...what's the overhead for
> a column?
>
> Jianshi
>
> On Thu, Nov 13, 2014 at 4:49 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > >>So is it possible to specify FASTDIFF for rowkey/column and DIFF for
> > value
> > cell?
> > No that is not possible now. All the encoding is per KV only.
> > But what you say is definitely worth trying.
> >
> > >>So would you recommend storing JSON flattened as many columns?
> > May be yes.  But I have practically not used JSON formats so I may not be
> > the best person to comment on this.
> >
> > Regards
> > Ram
> >
> > On Thu, Nov 13, 2014 at 2:01 PM, Jianshi Huang <jianshi.huang@gmail.com>
> > wrote:
> >
> > > Thanks Ram,
> > >
> > > So is it possible to specify FASTDIFF for rowkey/column and DIFF for
> > value
> > > cell?
> > >
> > > So would you recommend storing JSON flattened as many columns?
> > >
> > > Jianshi
> > >
> > > On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > Hi
> > > >
> > > > >> Since I'm storing
> > > > historical data (snapshot data) and changes between adjacent value
> > cells
> > > > are relatively small.
> > > >
> > > > If the values are changing even if it is smaller the FASTDIFF will
> > > rewrite
> > > > the value part.  Only if there are exact matches then it would skip
> the
> > > > value part. JFYI.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang <
> > jianshi.huang@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > I thought FASTDIFF was only for rowkey and columns, great if it
> also
> > > > works
> > > > > in value cell.
> > > > >
> > > > > And thanks for the bjson link!
> > > > >
> > > > > Jianshi
> > > > >
> > > > > On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > > >
> > > > > > There is FASTDIFF data block encoding.
> > > > > >
> > > > > > See also http://bjson.org/
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Nov 12, 2014, at 9:08 PM, Jianshi Huang <
> > jianshi.huang@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm currently saving JSON in pure String format in the
value
> cell
> > > and
> > > > > > > depends on HBase' block compression to reduce the overhead
of
> > JSON.
> > > > > > >
> > > > > > > I'm wondering if there's a more space efficient way to
store
> > JSON?
> > > > > > > (there're lots of 0s and 1s, JSON String actually is an
OK
> > format)
> > > > > > >
> > > > > > > I want to keep the value as a Map since the schema of source
> data
> > > > might
> > > > > > > change over time.
> > > > > > >
> > > > > > > Also is there a DIFF based encoding for values? Since I'm
> storing
> > > > > > > historical data (snapshot data) and changes between adjacent
> > value
> > > > > cells
> > > > > > > are relatively small.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > --
> > > > > > > Jianshi Huang
> > > > > > >
> > > > > > > LinkedIn: jianshi
> > > > > > > Twitter: @jshuang
> > > > > > > Github & Blog: http://huangjs.github.com/
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jianshi Huang
> > > > >
> > > > > LinkedIn: jianshi
> > > > > Twitter: @jshuang
> > > > > Github & Blog: http://huangjs.github.com/
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> > >
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message