hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Sechrist <ssechr...@gmail.com>
Subject Re: region, regionserver questions
Date Thu, 02 Dec 2010 22:50:02 GMT
Hey Albert,

If you use TableInputFormat, it will create one map task per region in that
table. So, each mapper should just talk to one regionserver.


On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <ashau@yahoo-inc.com> wrote:

> Hi,
> I'm doing a distributed scan of an hbase table using map-reduce by taking
> all the regions belonging to a regionserver, and then assigning those
> regions to a mapper (so there's 1 mapper per regionserver, and each mapper
> only talks to one regionserver).  However, doing it this way I'm getting
> some data skew.  For example, I have 2 tables U and T.  Each regionserver
> may have 30 regions, but one regionserver might have 10 regions from table U
> while another regionserver might have 25 regions from table U.  Is there a
> way to balance regions per table per regionserver (so that each regionserver
> has 15 regions from table U for example)?  Or should I just not worry about
> trying to have each individual mapper only talk to one regionserver?
> Also, how do regions get assigned to regionservers?  Is it based on data
> locality?  Region start/end keys?  Randomly?
> Thanks,
> Albert

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message