hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Beaudreault <bbeaudrea...@hubspot.com>
Subject Re: How to set number of mappers when using HBaseStorage
Date Wed, 21 May 2014 15:20:03 GMT
Hansi's scheduler configuration is the real solution here, but combining
more regions into a single split is useful for other reasons.  Specifically
it helps control load against an HBase cluster from the job; you don't
always want 50 mappers running against a single regionserver.

We run into this a lot at HubSpot, so I've created my own extension of
TableInputFormat and corresponding RecordReader so that you can partition
the mappers by regionserver.  This allows you to split all the regions for
a regionserver into a configurable number of mappers (1-N).  I haven't
contributed this yet, but you can get the code at
https://gist.github.com/bbeaudreault/9788499




On Wed, May 21, 2014 at 11:12 AM, Pradeep Gollakota <pradeepg26@gmail.com>wrote:

> I just looked at the source code for HBaseStorage. It uses a modified
> version of TableInputFormat under the hood. TableInputFormat, AFAIK, does
> not support controlling the number of launched Map tasks. It might be a
> worthwhile contribution to HBase to write an analogous version of a
> CombineInputFormat, so a single Map task can read multiple regions.
>
>
> On Wed, May 21, 2014 at 10:21 AM, Hansi Klose <hansi.klose@web.de> wrote:
>
> > Hi Lei,
> >
> > I don't know if that helps you, I had the same problem with the
> > replication verify jobs I
> > run in our environment.
> >
> > I created a fairscheduler pool on the jobtracker called "admin" and
> > configured
> > this pool with the maximum mappers the job should take.
> >
> > I inserted in my hbase-site.xml this section
> >
> >   <property>
> >     <name>mapred.queue.name</name>
> >     <value>admin</value>
> >   </property>
> >   <property>
> >
> > You need to insert this only on the node you start the job.
> >
> > Then I login as user "hbase" on that machine with the configuration.
> >
> > When i run my verify jobs as user "hbase" the job will go to the
> > fairscheduler pool
> > "admin" and will take only the allowed count of mappers.
> >
> > Before i took all mapper i could get.
> >
> > Regards Hansi
> >
> > > Gesendet: Mittwoch, 21. Mai 2014 um 04:16 Uhr
> > > Von: "leiwangouc@gmail.com" <leiwangouc@gmail.com>
> > > An: user <user@pig.apache.org>, user <user@hbase.apache.org>
> > > Betreff: How to set number of mappers when using HBaseStorage
> > >
> > >
> > > When using HBaseStorage to read data from hbase table, there will be
> one
> > mapper for one region.
> > > Howerver, my hbase table has more than 1000 regions and only 80 mappers
> > capacity.
> > > Is there a way to set the number of mappers when using HBaseStorage?
> > >
> > > Thanks,
> > > Lei
> > >
> > >
> > >
> > > leiwangouc@gmail.com
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message