hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From viva v <vivamail...@gmail.com>
Subject Re: HBase, Hive, Hive over HBase or Pig over HBase
Date Thu, 27 Oct 2011 18:51:03 GMT
Thanks Doug.

30 million is the size to start with, growth rate is about 1 million per

You mention HBase being used to generate summaies into an RDBMS, i am not
quite sure i understood this approach very well.
How would you generate the summaries from raw HBase data & update into a
RDBMS, would we need to accomplish this using a mapreduce job maybe?

Could you please point me to an example use case scenario that has taken
this approach?


On Thu, Oct 27, 2011 at 1:27 AM, Doug Meil <doug.meil@explorysmedical.com>wrote:

> re: "30 million records."
> We're obviously pro-HBase on this dist-list but one of the challenges of
> HBase (and Hadoop in general) is that the architecture can tend to be
> overkill on smaller datasets.  That doesn't mean you shouldn't try HBase,
> but expectations should be tempered.
> Especially with your requirements #5 and #6, RDBMS are actually pretty
> good at that for smaller volumes, which is why HBase tends to be used to
> generate summaries into RDBMSs for further slicing and dicing.
> If you had an arrival rate of 30 million a day or something, then it would
> be a different story.
> On 10/26/11 3:31 PM, "viva v" <vivamailers@gmail.com> wrote:
> >Hi,
> >
> >I am working on a use case that has the following characteristics.
> >1) Data volume is in the order 30 million records
> >2) Data schema is known & is fixed (for the application we are building)
> >3) Data is NOT multi format. A single key will have integer data for
> >different aspects of that key
> >4) Data will be incrementally updated (some column values will be updated
> >at
> >different points of time)
> >5) There is a need to support adhoc (queries are not known ahead of time)
> >querying of data (without writing map reduce jobs)
> >6) Queries are likely to have a lot of joins & aggregations
> >
> >Could you please help me with suggestions on whether i should use
> >1) Hive
> >2) HBase
> >3) Hive over HBase
> >4) Pig over HBase
> >
> >Thanks
> >Vivek

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message