hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sudhakara st <sudhakara...@gmail.com>
Subject Re: configuring number of mappers and reducers
Date Sun, 07 Apr 2013 17:53:42 GMT
Hi Samanesh,

You can experiment with
1. By varying  number reducer(mapred.reduce.tasks)

(Configure these parameters depends to you system capacity) .
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

Tasktrackers have a fixed number of slots for map tasks and for reduce
tasks,The precise number depends on the number of cores and the amount of
memory on the tasktracker nodes, for example,a a quad- core with8GM memory
may be able to run 3 map tasks and 2 reduce tasks (not precise, it depend
what type job you are running) simultaneously.


The right number of reduces seems to be 0.95 or 1.75 * (nodes *
mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch
immediately and start transferring map outputs as the maps finish. At 1.75
the faster nodes will finish their first round of reduces and launch a
second round of reduces doing a much better job of load balancing.

2. These are some main job tuning factors in term cluster resource
utilization(CPU, memory,I/O, network) and response time.
   A)  io.sort.mb
         io.sort.record.percent
         io.sort.spill.percent
         io.sort.factor
          mapred.reduce.parallel.copies

   B) Compression of Mapper and reducer outputs
        mapred.map.output.compression.codec

    C)Enabling/Disabling   Speculative job execution
          mapred.map.tasks.speculative.execution.
          mapred.reduce.tasks.speculative.execution

    D) Enabling JVM reuse
           mapred.job.reuse.jvm.num.tasks


On Sun, Apr 7, 2013 at 10:31 PM, Samaneh Shokuhi
<samaneh.shokuhi@gmail.com>wrote:

> Thanks Sudhakara for your reply.
> So if number of mappers depends on the data size ,maybe the best way to do
> my experiments is to increase the number of reducers based on the number of
> estimated blocks in data file.Actually i want to know how response time is
> changed by changing the number of mappers and reducers.
> Any idea about the way of  doing this kind of experiment?
>
> Samaneh
>
>
> On Sun, Apr 7, 2013 at 6:29 PM, sudhakara st <sudhakara.st@gmail.com>
> wrote:
>
> > Hi Samaneh,
> >
> >             The number of map tasks for a given job is driven by the
> number
> > of input splits in the input data. ideally in default configurations
>  each
> > input split(for a block) a map task is spawned. So your 2.5G of data
> > contains 44 blocks, therefore you jobs taking 44 map task. At minimum,
> with
> > FileInputFormat derivatives, job will have at least one map per file and
> > can have multiple maps per file if they extend beyond a single block(file
> > size is more that block size). The *mapred.map.tasks* parameter is just a
> > hint to the InputFormat for the number of maps. its does not have any
> > effect if the number blocks in the input date more then specified value.
> It
> > not possible to specify number mapper need run for a job. But it possible
> > to explicitly specify  number reduce can run for a job by using *
> > mapred.reduce.tasks* property.
> >
> > The replication factor in not related in any to number of mapper and
> > reducer.
> >
> >
> >
> >
> >
> >
> >
> > On Sun, Apr 7, 2013 at 7:38 PM, Samaneh Shokuhi
> > <samaneh.shokuhi@gmail.com>wrote:
> >
> > > Hi All,
> > > I am doing some experiments by running WordCount example on hadoop.
> > > I have a cluster with 7 nodes .I want to run WordCount example with
> > > 3mappers and 3 reducers and compare the response time with another
> > > experiments when number of mappers and reducers increased to 6 and 12
> and
> > > so on.
> > > For first experiment i set number of the mappers and reducer to 3 in
> > > wordCount example source code .and also set the number of replications
> > to 3
> > > in hadoop configurations.Also  the maximum number of tasks per node is
> > set
> > > to 1 .
> > > But when i run the sample with a big data like 2.5 G ,i can see 44 map
> > > tasks and 3 reduce tasks are running !!
> > >
> > > What parameters do i need to set to have like (3Mappers,3 Reducers),
> > > (6M,6R) and (12M,12R) and as i mentioned i have a cluster with 1
> namenode
> > > and 6 datanodes.
> > > Is number of replications related to the number of mappers and reducers
> > ?!
> > > Regards,
> > > Samaneh
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > .....  Sudhakara.st
> >
>



-- 

Regards,
.....  Sudhakara.st

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message