hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message
Date Thu, 25 Oct 2012 15:45:29 GMT
On Thu, Oct 25, 2012 at 8:31 AM, Nick maillard
<nicolas.maillard@fifty-five.com> wrote:
> Hi jean-Daniel
> Ok I'll sent it in the env thanks for the advice.
> Are their other libs I might need to add?

The usual client libs... doesn't seem like we documented them
anywhere... it's pretty much what you have in now.

> Could just tell hive to use it's lib directory or hbase's lib directory in it's
> classpath in some way?

That's a question for the hive ML.

> I could just set it in the bashrc but that's not very elegant.

I really meant that you should use HIVE_AUX_JARS_PATH in hive-env.sh

> Another thing I am testing my 3 machine hadoop cluster.
> I have queried 'select * from myTestTable' which has 1719428 entries.
> The 7 map tasks and 1 reducer took almost 5 minutes to compute, I am right to
> think it is a little slow?

You have a 1-2 minutes overhead in there because you are using
MapReduce, then usually one should set hbase.client.scanner.caching to
a better value than 1. It's client-side so hive needs to have it. But
everything will seem slow when using MR on such a small dataset, a
single client running a scan would be faster in this case.

> How could I make this go faster, more map tasks, more nodes?

Is select count(*) really the use case you want to optimize? Have you
read this? http://hbase.apache.org/book.html#performance

> True I would never scan a whole table usually but I could easily have queries
> that MR over a set of this size.

View raw message