TableMapReduceUtil can set the number of mappers and reducers appropriate to the number of
regions in the table at job start time.
See TableMapReduceUtil#setNumMapTasks and TableMapReduceUtil#setNumReduceTasks
- Andy
________________________________
From: stack <stack@duboce.net>
To: hbase-user@hadoop.apache.org
Sent: Friday, June 12, 2009 11:59:27 AM
Subject: Re: HBase Failing on Large Loads
On Fri, Jun 12, 2009 at 11:50 AM, mike anderson <saidtherobot@gmail.com>wrote:
>
> I'm wondering how you set up your job to run 2 maps/1 reducer per machine.
> Is this a matter of adding more region servers? I currently have 1
> regionserver and 144 regions (living on the same cluster as hadoop.
>
TableInputFormat makes as many maps as there are regions (with some
caveats). My guess is that you only have 4 regions in you table since you
don't have that many rows? Your best bet is study of TIF#getSplits. You
could override it to get more maps or, just trust that when you have more
data in the table, and therefore more regions, more maps will be run.
On the reduce side, I'm not sure. Check TableOutputFormat but I'd say 1
reduce per machine is default. In this case hbase is probably respecting
what you have configured in your hadoop-site.xml/mapred-site.xml.
St.Ack
|