hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@cawoom.com>
Subject Re: Nested data structures examples for HBase
Date Wed, 10 Sep 2014 06:17:01 GMT
as stated above you can use JSON or something similar, which is always
possible. However, if you have to do that very often (and I think you
are, if you using hbase ;) ), this could be a bad plan, because parsing
JSON is expensive in terms of CPU.

As I am relativly new to hbase (using it perhaps for a year and not
using most of the fancy features) perhaps my suggestion is not clever
... but why not using hbase directly?

If your structure is something like

	A : "A"
	B : {
		B1 : "B1" ,
		B2 : "B2"

why not using qualifiers like "data:B,B1" where "data" is your column

Your explaination of your problem seems to fit this idea perfectly, as
you are not interested in JSON like behaviour (requesting B => getting
"{B1: "B1" , B2 : "B2"}"), but like having a defined structure (fixed
number of layers etc.).

So if you want to query "B=>B2", just adding "B,B2" as qualifier to the
get request and fire?

This is of course only possible if the queried names are known. If not
you have to query the whole column family, which could get very big
regarding your requirements below ... but still would be possible.

However, by using a "," as seperator, just as an example, the parsing of
the object to whatever you need should be very simple. however, as you
stated, that you just want to write stuff and query it directly even
this cheap parsing shouldn't be required.

This sounds much more easy and much cheaper regarding CPU usage to me
than the JSON, XML, whatever plan.

Do I misunderstood your problem completely? Or does the above outlined
plan has flaws (as question to the hbase experts)?

Best wishes,


Am 08.09.2014 um 23:06 schrieb Stephen Boesch:
> While I am aware that HBase does not have native support for nested
> structures, surely there are some of you that have thought through this use
> case carefully.
> Our particular use case is likely having single digit nested layers with
> tens to hundreds of items in the lists at each level.
> An example would be a
>  top Level  300 items
>  middle level :  1 to 100 items  ("1 value"  may indicate a single value as
> opposed to a list)
>  third level:  1 to 50 items
>  fourth level  1 to 20 items
> The column names are likely known ahead of time- which may or may not
> matter for hbase.  We could model the above structure in a Parquet File or
> in Hive (with nested struct's)- but we would like to consider whether
> HBase.might also be an option.

View raw message