hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message
Date Thu, 25 Oct 2012 16:09:45 GMT
On Thu, Oct 25, 2012 at 9:00 AM, Nick maillard
<nicolas.maillard@fifty-five.com> wrote:
> Hi Jean-Daniel
> Again thanks for the quick reply and for the env detail I'll get to it.
> Of course select count (*) is not what I want to optimize.
> My more regular queries will have an Hbase schema designed for them using the
> rowkeys and potentially column families etc...
> I'm guessing Hive uses the rowkey hash aspect when in the sql query.

HBase row keys aren't hashes, it relies completely on their
lexicographical sorted nature. Hive really just uses HBase's input
format, which creates 1 map per region. Then each mapper scans from
the start key to the end key within each region in parallel with the
other mappers.

> My question on a more general note. When querying hbase through hive on tables
> that have not been designed specifically with that typeof query in mind I wanted
> to keep query time low. I'm trying to get a feel of when I should make table
> with a thought out rowkey, family etc.. and to what extent I can have a decent
> query time on more exotic queries.

What kind of "decent query time" are you looking for?

> I am trying to decide If I make several tables on a dataset for the very common
> queries and for other more rare queries If Hive can give me good resolve time or
> If I should use to extract a good view to feed to other querying systems, like
> big query or Mysql or anything.

It really depends on what your use case is.

View raw message