Generally you want to have # map partitions = # table regions.
Then from there, you configure in the hadoop config how many to run at the
same time per machine.
On Fri, Jun 12, 2009 at 11:59 AM, stack <stack@duboce.net> wrote:
> On Fri, Jun 12, 2009 at 11:50 AM, mike anderson <saidtherobot@gmail.com
> >wrote:
>
> >
> > I'm wondering how you set up your job to run 2 maps/1 reducer per
> machine.
> > Is this a matter of adding more region servers? I currently have 1
> > regionserver and 144 regions (living on the same cluster as hadoop.
> >
>
> TableInputFormat makes as many maps as there are regions (with some
> caveats). My guess is that you only have 4 regions in you table since you
> don't have that many rows? Your best bet is study of TIF#getSplits. You
> could override it to get more maps or, just trust that when you have more
> data in the table, and therefore more regions, more maps will be run.
>
> On the reduce side, I'm not sure. Check TableOutputFormat but I'd say 1
> reduce per machine is default. In this case hbase is probably respecting
> what you have configured in your hadoop-site.xml/mapred-site.xml.
>
> St.Ack
>
|