hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: NoSuchColumnFamilyException with rowcounter
Date Thu, 11 Oct 2012 20:53:43 GMT
2 tasks at the same time, for a total of 25 tasks at the end.

Maybe as you are saying, I'm not facing the good jobtracker? I'm
running the command line on the master server.

If I look at the map tasks, I can see that:
Input Split Locations /default-rack/node1

With differents values depending on the tasks, but on the same page I
can see machine=/default-rack/node3 (which is my master).

How/where should I run this? Should I point it to Zookeeper instance instead?



2012/10/11 Jean-Daniel Cryans <jdcryans@apache.org>:
> 2 tasks total or that are running at the same time? If latter, it just
> means that you are using the local job tracker instead of your job
> tracker because HBase couldn't find your MR config.
> J-D
> On Thu, Oct 11, 2012 at 1:36 PM, Jean-Marc Spaggiari
> <jean-marc@spaggiari.org> wrote:
>> Hi J-D,
>> I have about 20M rows over 25 regions on 6 nodes. So that mean I
>> should see something like 6 tasks or even 25, right? And not just 2?
>> Keys are 128 byte long. Value is 1 byte.
>> I tried also to update mapreduce.tasktracker.map.tasks.maximum but
>> this is "the number of map tasks that should be launched on each node,
>> not the number of nodes to be used for each map task.", so there was
>> no changes, as expected.
>> JM
>> 2012/10/11 Jean-Daniel Cryans <jdcryans@apache.org>:
>>> On Thu, Oct 11, 2012 at 1:20 PM, Jean-Marc Spaggiari
>>> <jean-marc@spaggiari.org> wrote:
>>>> I'm now using thsi command line and it's working fine (except for the
>>>> number of tasks).
>>>> HADOOP_CLASSPATH=`/home/hbase/hbase-0.94.0/bin/hbase
>>>> classpath`:`/home/hadoop/hadoop-1.0.3/bin/hadoop classpath`
>>>> /home/hadoop/hadoop-1.0.3/bin/hadoop jar
>>>> /home/hbase/hbase-0.94.0/hbase-0.94.1.jar rowcounter
>>>> -Dhbase.client.scanner.caching=100 -Dmapred.map.tasks=6
>>>> -Dmapred.map.tasks.speculative.execution=false work_proposed
>>>> I simply don't know if the -D parameters are taken into consideration
>>>> since I get the same results (numbers of tasks, time of exec, etc.)
>>>> with and without them.
>>> Using a higher caching value won't do much good if you don't have a
>>> lot of rows. Since you didn't include any data like that in your
>>> email, I won't guess how much 100 would help your case.
>>> The number of map tasks when mapping an HBase table will be the number
>>> of regions you have in that table. Unfortunately you can't change it
>>> unless you write your own input format for HBase.
>>> J-D

View raw message