hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <tombrow...@gmail.com>
Subject Re: Is it ok to store all integers as Strings instead of byte[] in hbase?
Date Fri, 08 Jul 2016 20:24:17 GMT
Hi Mahesha,

To answer your question *2: Both strings and numbers are being stored as a
byte[].

The value 25 can be serialized as a byte[] in many ways:

1. As a numeric string, by storing the value as [ 50, 53 ], where 50 is the
byte that represents the character '2' and 53 is the byte for character '5'.
2. As an integer, by storing the value as it is represented in binary [ 0,
0, 0, 25 ]
3. As a floating point number, by storing the value as it is represented in
binary [ 65, 200, 0, 0 ]

By default, the shell only knows how to do it the first way. I don't know
if there's a way to tell the shell to try and interpret the field value as
a non-string type before storing it.

As you discovered, though, writing java code gives you control. You can do
it in any way: Byte.toBytes(String value) will do the first, and
Byte.toBytes(int value) will do the second.

I recommend storing values as binary because when numbers are binary they
are generally lexicographically ordered (which makes sorting easier). That
said, it's important to settle on a single format (even if you store all
numbers as strings), rather than storing some in one format and others in
another.

--Tom

On Fri, Jul 8, 2016 at 1:17 PM, anil gupta <anilgupta84@gmail.com> wrote:

> Hi Mahesha,
>
> I think its not a good idea to store Numbers/Dates as String. If you store
> numbers as strings then you wont be able to do numerical/date comparison.
> HBase is Data Type Agnostic. IMO, you will be better off by using Apache
> Phoenix(http://phoenix.apache.org/). Phoenix is a sql layer on top of
> HBase. It is ANSI SQL compliant.
>
> Currently Phoenix is officially supported by HDP and it is also present in
> cloudera labs.
>
> HTH,
> Anil Gupta
>
> On Fri, Jul 8, 2016 at 5:18 AM, Dima Spivak <dspivak@cloudera.com> wrote:
>
> > Hey Mahesha,
> >
> > It might be worthwhile to read through the architecture section of our
> ref
> > guide: https://hbase.apache.org/book.html#_architecture
> >
> > Cheers,
> >   Dima
> >
> > On Friday, July 8, 2016, Mahesha999 <abnave.m@gmail.com> wrote:
> >
> > > I am trying out some hbase code. I realised that when I insert data
> > through
> > > hbase shell using put command, then everything (both numeric and
> string)
> > is
> > > put as string:
> > >
> > > hbase(main):001:0> create 'employee', {NAME => 'f'}
> > > hbase(main):003:0> put 'employee', 'ganesh','f:age',30
> > > hbase(main):004:0> put 'employee', 'ganesh','f:desg','mngr'
> > > hbase(main):005:0> scan 'employee'
> > > ROW                   COLUMN+CELL
> > > ganesh               column=f:age, timestamp=1467926618738, value=30
> > > ganesh               column=f:desg, timestamp=1467926639557, value=mngr
> > >
> > > However when I put data using Java API, non-string stuff gets
> serialized
> > as
> > > byte[]:
> > >
> > > Cluster lNodes = new Cluster();
> > > lNodes.add("digitate-VirtualBox:8090");
> > > Client lClient= new Client(lNodes);
> > > RemoteHTable remoteht = new RemoteHTable(lClient, "employee");
> > >
> > > Put lPut = new Put(Bytes.toBytes("mahesh"));
> > > lPut.add(Bytes.toBytes("f"), Bytes.toBytes("age"), Bytes.toBytes(25));
> > > lPut.add(Bytes.toBytes("f"), Bytes.toBytes("desg"),
> > Bytes.toBytes("dev"));
> > > remoteht.put(lPut);
> > >
> > > Scan in hbase shell shows age 25 of mahesh is stored as
> \x00\x00\x00\x19:
> > >
> > > hbase(main):006:0> scan 'employee'
> > > ROW                   COLUMN+CELL
> > > ganesh               column=f:age, timestamp=1467926618738, value=30
> > > ganesh               column=f:desg, timestamp=1467926639557, value=mngr
> > > mahesh               column=f:age, timestamp=1467926707712,
> > > value=\x00\x00\x00\x19
> > > mahesh               column=f:desg, timestamp=1467926707712, value=dev
> > >
> > > *1.* Considering I will be storing only numeric and string data in
> hbase,
> > > what benefits it does provide to store numeric data as byte[] (as in
> case
> > > of
> > > above) or as string:
> > > lPut.add(Bytes.toBytes("f"), Bytes.toBytes("age"),
> Bytes.toBytes("25"));
> > > //instead of toBytes(25)
> > >
> > > *2.*Also why strings are stored as is and are not serialized to byte[]
> > even
> > > when put using Java API?
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://apache-hbase.679495.n3.nabble.com/Is-it-ok-to-store-all-integers-as-Strings-instead-of-byte-in-hbase-tp4081100.html
> > > Sent from the HBase User mailing list archive at Nabble.com.
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message