hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Persist JSON into HBase
Date Thu, 03 Feb 2011 08:47:39 GMT
Sorry for the late bump...

It is quite nice to store JSON as strings in HBase, i.e. use for
example JSONObject to convert to something like "{ "name' : "lars" }"
and then Bytes.toBytes(jsonString). Since Hive now has a HBase handler
you can use Hive and its built in JSON support to query cells like so:

select get_json_object(hbase_table.value, '$.name') from hbase_table
where key = <some-key>;

and it returns "lars".

Lars

On Mon, Jan 31, 2011 at 10:15 PM, Sandy Pratt <prattrs@adobe.com> wrote:
> My use of HBase is essentially what Stack describes: I serialize little log entry objects
with (mostly) protobuf and store them in a single cell in HBase.  I did this at first because
it was easy, and made a note to go back and break out the fields into their own columns, and
in fact into multiple column families in some cases.  When I went back and did this, I found
that my 'exploded' schema was actually slower to scan than the 'blob' schema was, and filters
didn't seem to help all that much.  This was in the 0.20 days, IIRC.  So this is to say,
+1 on storing blobs in HBase.
>
> I don't know if this would work for you, but what's worked well for me is to write side
files for Hive to read as I ingest entries into HBase.  I like HBase for durability, random
access, sorting, and scanning, and I'll continue to use it to store the golden copy for the
foreseeable future, but I've found that Hive against text files is at least a couple of times
faster than MR against an HBase source for my map reduce needs.  If you find that what you
need from the Hive schema changes over time, you can simply nuke the files and recreate them
with a map reduce against the golden copy in HBase.
>
> Sandy
>

Mime
View raw message