hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: Doubt in HBase
Date Thu, 20 Aug 2009 18:34:45 GMT
On Thu, Aug 20, 2009 at 9:42 AM, john smith <js1987.smith@gmail.com> wrote:

> Hi all ,
> I have one small doubt . Kindly answer it even if it sounds silly.

No questions are silly.. Dont worry

> Iam using Map Reduce in HBase in distributed mode .  I have a table which
> spans across 5 region servers . I am using TableInputFormat to read the
> data
> from the tables in the map . When i run the program , by default how many
> map regions are created ? Is it one per region server or more ?

If you set the number of map tasks to a high number, it automatically spawns
one map task for each region (not region server). Otherwise, it'll spawn the
number you have explicitly specified in the job.

> Also after the map task is over.. reduce task is taking a bit more time .
> Is
> it due to moving the map output across the regionservers? i.e, moving the
> values of same key to a particular reduce phase to start the reducer? Is
> there any way i can optimize the code (e.g. by storing data of same reducer
> nearby )

Increase the number of reducers. Each reducer will have lesser data to move.

> Thanks :)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message