hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: MapReduce: Reducers partitions.
Date Thu, 11 Apr 2013 11:52:46 GMT
Thanks all for your comments.

I looked for partitioners into HBase scope only, that's why I also thought
we where using HTablePartitioner. But looking at the default one used I
found org.apache.hadoop.mapreduce.lib.partition.HashPartitioner like St.Ack
confirmed. And it's doing exactly what I was talking about for the keyhash
(and not keycrc).

Changing HRegionPartitioner behaviour also will be useless because
TableMapReduceUtil will overwrite the number of reducers if we have set
more than the number of regions.

      if (job.getNumReduceTasks() > regions) {
        job.setNumReduceTasks(outputTable.getRegionsInfo().size());
      }

So I just need to stay with the default partioner then.

Thanks,

JM



2013/4/10 Stack <stack@duboce.net>

> On Wed, Apr 10, 2013 at 12:01 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Greame,
> >
> > No. The reducer will simply write on the table the same way you are
> doing a
> > regular Put. If a split is required because of the size, then the region
> > will be split, but at the end, there will not necessary be any region
> > split.
> >
> > In the usecase described below, all the 600 lines will "simply" go into
> the
> > only region in the table and no split will occur.
> >
> > The goal is to partition the data for the reducer only. Not in the table.
> >
>
>
> Then just use the default partitioner?
>
> The suggestion that you use HTablePartitioner seems inappropriate to your
> task.  See the sink doc here:
>
> http://hadoop.apache.org/docs/r2.0.3-alpha/api/org/apache/hadoop/mapreduce/lib/partition/HashPartitioner.html
>
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message