hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: HBase, Hive, Hive over HBase or Pig over HBase
Date Wed, 26 Oct 2011 19:57:26 GMT

re: "30 million records."

We're obviously pro-HBase on this dist-list but one of the challenges of
HBase (and Hadoop in general) is that the architecture can tend to be
overkill on smaller datasets.  That doesn't mean you shouldn't try HBase,
but expectations should be tempered.

Especially with your requirements #5 and #6, RDBMS are actually pretty
good at that for smaller volumes, which is why HBase tends to be used to
generate summaries into RDBMSs for further slicing and dicing.

If you had an arrival rate of 30 million a day or something, then it would
be a different story.

On 10/26/11 3:31 PM, "viva v" <vivamailers@gmail.com> wrote:

>I am working on a use case that has the following characteristics.
>1) Data volume is in the order 30 million records
>2) Data schema is known & is fixed (for the application we are building)
>3) Data is NOT multi format. A single key will have integer data for
>different aspects of that key
>4) Data will be incrementally updated (some column values will be updated
>different points of time)
>5) There is a need to support adhoc (queries are not known ahead of time)
>querying of data (without writing map reduce jobs)
>6) Queries are likely to have a lot of joins & aggregations
>Could you please help me with suggestions on whether i should use
>1) Hive
>2) HBase
>3) Hive over HBase
>4) Pig over HBase

View raw message