hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: TableMapper and getSplits
Date Fri, 02 Apr 2010 20:08:18 GMT
Splitting a table on its Regions makes most sense when one table only
involved.  For your case, just override the splitter and make
different split objects.

As to the 'underloaded' hbase when one task per region, I'd say try it
first.  If many regions on the one regionserver, could make for a
decent load on the regionserver hosting.

Good luck,
St.Ack

On Fri, Apr 2, 2010 at 12:19 PM, Geoff Hendrey <ghendrey@decarta.com> wrote:
> Hello,
>
> I have subclassed TableInputFormat and TableMapper. My job needs to read
> from two tables (one row from each) during its map method. the reduce
> method needs to write out to a table. For both the reads and the writes,
> I am using simple Get and Put respectively with autoflush true.
>
> One problem I see is that the number of map tasks that I get with HBase
> is limited to the number of regions in the table. This seems to make the
> job slower than it would be if I had many more mappers. Could I improve
> the situation by overriding getSplits so that I could have many more
> mappers?
>
> I saw the following doc'd in TableMapReduceUtil: "Ensures that the given
> number of reduce tasks for the given job configuration does not exceed
> the number of regions for the given table. " Is there some reason one
> would want to insure that the number of tasks doesn't exceed the number
> of regions? It just seems to me that having one region serv only a
> single task would result in an underloaded HBase. Thoughts?
>
> -geoff
>

Mime
View raw message