hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samaneh Shokuhi <samaneh.shok...@gmail.com>
Subject Re: configuring number of mappers and reducers
Date Tue, 09 Apr 2013 08:35:58 GMT
Thanks Sudhakara for your reply.
I did my experminets by varing number of reducers and made it double in
each experiments .I have a qustion regarding to the response time.Suppose
there is 6 cluster nodes and in first experminet i have 3 reducers and it
gets doubled (6 ) in second experiment  and in third one 12 .So what do we
expect to see in response time ? Should it get changed approximately like
T,T/2,T/4,.. ?!
What i get as response time is not changed like that,  decreasion is like
2% or 3% .So i want to know by increasing the number of reducers how much
decreasion normally we should get in response time ?

Samaneh


On Sun, Apr 7, 2013 at 7:53 PM, sudhakara st <sudhakara.st@gmail.com> wrote:

> Hi Samanesh,
>
> You can experiment with
> 1. By varying  number reducer(mapred.reduce.tasks)
>
> (Configure these parameters depends to you system capacity) .
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
>
> Tasktrackers have a fixed number of slots for map tasks and for reduce
> tasks,The precise number depends on the number of cores and the amount of
> memory on the tasktracker nodes, for example,a a quad- core with8GM memory
> may be able to run 3 map tasks and 2 reduce tasks (not precise, it depend
> what type job you are running) simultaneously.
>
>
> The right number of reduces seems to be 0.95 or 1.75 * (nodes *
> mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch
> immediately and start transferring map outputs as the maps finish. At 1.75
> the faster nodes will finish their first round of reduces and launch a
> second round of reduces doing a much better job of load balancing.
>
> 2. These are some main job tuning factors in term cluster resource
> utilization(CPU, memory,I/O, network) and response time.
>    A)  io.sort.mb
>          io.sort.record.percent
>          io.sort.spill.percent
>          io.sort.factor
>           mapred.reduce.parallel.copies
>
>    B) Compression of Mapper and reducer outputs
>         mapred.map.output.compression.codec
>
>     C)Enabling/Disabling   Speculative job execution
>           mapred.map.tasks.speculative.execution.
>           mapred.reduce.tasks.speculative.execution
>
>     D) Enabling JVM reuse
>            mapred.job.reuse.jvm.num.tasks
>
>
> On Sun, Apr 7, 2013 at 10:31 PM, Samaneh Shokuhi
> <samaneh.shokuhi@gmail.com>wrote:
>
> > Thanks Sudhakara for your reply.
> > So if number of mappers depends on the data size ,maybe the best way to
> do
> > my experiments is to increase the number of reducers based on the number
> of
> > estimated blocks in data file.Actually i want to know how response time
> is
> > changed by changing the number of mappers and reducers.
> > Any idea about the way of  doing this kind of experiment?
> >
> > Samaneh
> >
> >
> > On Sun, Apr 7, 2013 at 6:29 PM, sudhakara st <sudhakara.st@gmail.com>
> > wrote:
> >
> > > Hi Samaneh,
> > >
> > >             The number of map tasks for a given job is driven by the
> > number
> > > of input splits in the input data. ideally in default configurations
> >  each
> > > input split(for a block) a map task is spawned. So your 2.5G of data
> > > contains 44 blocks, therefore you jobs taking 44 map task. At minimum,
> > with
> > > FileInputFormat derivatives, job will have at least one map per file
> and
> > > can have multiple maps per file if they extend beyond a single
> block(file
> > > size is more that block size). The *mapred.map.tasks* parameter is
> just a
> > > hint to the InputFormat for the number of maps. its does not have any
> > > effect if the number blocks in the input date more then specified
> value.
> > It
> > > not possible to specify number mapper need run for a job. But it
> possible
> > > to explicitly specify  number reduce can run for a job by using *
> > > mapred.reduce.tasks* property.
> > >
> > > The replication factor in not related in any to number of mapper and
> > > reducer.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Sun, Apr 7, 2013 at 7:38 PM, Samaneh Shokuhi
> > > <samaneh.shokuhi@gmail.com>wrote:
> > >
> > > > Hi All,
> > > > I am doing some experiments by running WordCount example on hadoop.
> > > > I have a cluster with 7 nodes .I want to run WordCount example with
> > > > 3mappers and 3 reducers and compare the response time with another
> > > > experiments when number of mappers and reducers increased to 6 and 12
> > and
> > > > so on.
> > > > For first experiment i set number of the mappers and reducer to 3 in
> > > > wordCount example source code .and also set the number of
> replications
> > > to 3
> > > > in hadoop configurations.Also  the maximum number of tasks per node
> is
> > > set
> > > > to 1 .
> > > > But when i run the sample with a big data like 2.5 G ,i can see 44
> map
> > > > tasks and 3 reduce tasks are running !!
> > > >
> > > > What parameters do i need to set to have like (3Mappers,3 Reducers),
> > > > (6M,6R) and (12M,12R) and as i mentioned i have a cluster with 1
> > namenode
> > > > and 6 datanodes.
> > > > Is number of replications related to the number of mappers and
> reducers
> > > ?!
> > > > Regards,
> > > > Samaneh
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > .....  Sudhakara.st
> > >
> >
>
>
>
> --
>
> Regards,
> .....  Sudhakara.st
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message