hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Sechrist <ssechr...@gmail.com>
Subject Re: region, regionserver questions
Date Thu, 02 Dec 2010 22:50:02 GMT
Hey Albert,

If you use TableInputFormat, it will create one map task per region in that
table. So, each mapper should just talk to one regionserver.

-Sean

On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <ashau@yahoo-inc.com> wrote:

> Hi,
>
> I'm doing a distributed scan of an hbase table using map-reduce by taking
> all the regions belonging to a regionserver, and then assigning those
> regions to a mapper (so there's 1 mapper per regionserver, and each mapper
> only talks to one regionserver).  However, doing it this way I'm getting
> some data skew.  For example, I have 2 tables U and T.  Each regionserver
> may have 30 regions, but one regionserver might have 10 regions from table U
> while another regionserver might have 25 regions from table U.  Is there a
> way to balance regions per table per regionserver (so that each regionserver
> has 15 regions from table U for example)?  Or should I just not worry about
> trying to have each individual mapper only talk to one regionserver?
>
> Also, how do regions get assigned to regionservers?  Is it based on data
> locality?  Region start/end keys?  Randomly?
>
> Thanks,
> Albert
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message