hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: HBase design schema
Date Mon, 04 Apr 2011 16:25:16 GMT
Miguel,

One option is to use the simplest design and use the key you have.  Scanning
for a particular period of time will give you all the data in that time
period which you can reduce in any way that you like.

If that becomes too inefficient, a common trick is to build a secondary file
that contains aggregated data at lower time resolution.

Another trick is to copy your original table pushing one of your dimension
into the key.  That will help by preventing you from scanning through data
you don't care about.  The space consumed is not so far off what an index in
a conventional database would consume.

In general, it is important to keep in mind that Hbase doesn't have
conventional relational indexes so lots of the design considerations that
motivate star schemas don't really apply.

On Mon, Apr 4, 2011 at 9:12 AM, Miguel Costa <miguel-costa@telecom.pt>wrote:

> Hi,
>
>
>
> I need some help to a schema design on HBase.
>
>
>
> I have 5 dimensions (Time,Site,Referrer Keyword,Country).
>
> My row key is Site+Time.
>
>
>
> Now I want to answer some questions like what is the top Referrer by
> Keyword for a site on a Period of Time.
>
> Basically I want to cross all the dimensions that I have. And if I have 30
> dimensions?
>
>
>
> What is the best schema design.
>
>
>
> Please let me know  if this isn’t the right mailing list.
>
>
>
> Thank you for your time.
>
>
>
> Miguel
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message