hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samaneh Shokuhi <samaneh.shok...@gmail.com>
Subject Re: configuring number of mappers and reducers
Date Tue, 09 Apr 2013 20:48:10 GMT
Sudhakara,thanks again for your information.

Actually the reason i am focused on response time is i am going to modify
hadoop to skip the sort phase in mapTask and run a sample like wordCount
example on modified hadoop (skipped sort in map task) and compare its
performance with unmodified hadoop .In fact i need to know how  sorting
part affects on performance and if in any cases we can skip the sort part
in map phase and get better performance .
So to do this experiment i need a way to measure the performance .I wonder
response time is a proper factor in this case to measure the performance or
not.Do you suggest any way to measure the performance in this experiment?

Samaneh




On Tue, Apr 9, 2013 at 5:43 PM, sudhakara st <sudhakara.st@gmail.com> wrote:

> Hi Samanesh,
>
> Increasing the reducer for a job would not help as you excepting. In most
> of MR jobs  more then 60% time will spent in mapper phase(it depends upon
> what type of operation performing on data in map and reducer phase).
>
> Increasing the number of reduces increases the framework overhead, but
> increases load balancing, available map-reduce slots allocation, system
> resource utilization by considering job processes requirement we can
> optimize the jobs for best performance with lowers the cost of failures.
>
> One more i cannot understand is why your so much worrying about response
> time ?. The response time purely depends upon the how much data you are
> processing in the job, what type of operation performing on the data, how
> data distributed in the cluster and capacity of your cluster.  A MR job
> should says it is optimized it  contains balanced number of mapper and
> reducer.  As per normal MR applications like word count i suggest to mapper
> and reducer ratio 4:1(if your jobs running without  combiner, In word count
> like program with combiner defined, then i will suggest use 10:1 ) .
>
> While tuning the MR jobs we cannot consider only response time as parameter
> to optimize the job, there so many other factors need consider, and
> response time not only depends on number of reducer we configure for job,
> it depends on numerous other factors as mentioned above.
>
>
>
> On Tue, Apr 9, 2013 at 2:05 PM, Samaneh Shokuhi
> <samaneh.shokuhi@gmail.com>wrote:
>
> > Thanks Sudhakara for your reply.
> > I did my experminets by varing number of reducers and made it double in
> > each experiments .I have a qustion regarding to the response time.Suppose
> > there is 6 cluster nodes and in first experminet i have 3 reducers and it
> > gets doubled (6 ) in second experiment  and in third one 12 .So what do
> we
> > expect to see in response time ? Should it get changed approximately like
> > T,T/2,T/4,.. ?!
> > What i get as response time is not changed like that,  decreasion is like
> > 2% or 3% .So i want to know by increasing the number of reducers how much
> > decreasion normally we should get in response time ?
> >
> > Samaneh
> >
> >
> > On Sun, Apr 7, 2013 at 7:53 PM, sudhakara st <sudhakara.st@gmail.com>
> > wrote:
> >
> > > Hi Samanesh,
> > >
> > > You can experiment with
> > > 1. By varying  number reducer(mapred.reduce.tasks)
> > >
> > > (Configure these parameters depends to you system capacity) .
> > > mapred.tasktracker.map.tasks.maximum
> > > mapred.tasktracker.reduce.tasks.maximum
> > >
> > > Tasktrackers have a fixed number of slots for map tasks and for reduce
> > > tasks,The precise number depends on the number of cores and the amount
> of
> > > memory on the tasktracker nodes, for example,a a quad- core with8GM
> > memory
> > > may be able to run 3 map tasks and 2 reduce tasks (not precise, it
> depend
> > > what type job you are running) simultaneously.
> > >
> > >
> > > The right number of reduces seems to be 0.95 or 1.75 * (nodes *
> > > mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can
> launch
> > > immediately and start transferring map outputs as the maps finish. At
> > 1.75
> > > the faster nodes will finish their first round of reduces and launch a
> > > second round of reduces doing a much better job of load balancing.
> > >
> > > 2. These are some main job tuning factors in term cluster resource
> > > utilization(CPU, memory,I/O, network) and response time.
> > >    A)  io.sort.mb
> > >          io.sort.record.percent
> > >          io.sort.spill.percent
> > >          io.sort.factor
> > >           mapred.reduce.parallel.copies
> > >
> > >    B) Compression of Mapper and reducer outputs
> > >         mapred.map.output.compression.codec
> > >
> > >     C)Enabling/Disabling   Speculative job execution
> > >           mapred.map.tasks.speculative.execution.
> > >           mapred.reduce.tasks.speculative.execution
> > >
> > >     D) Enabling JVM reuse
> > >            mapred.job.reuse.jvm.num.tasks
> > >
> > >
> > > On Sun, Apr 7, 2013 at 10:31 PM, Samaneh Shokuhi
> > > <samaneh.shokuhi@gmail.com>wrote:
> > >
> > > > Thanks Sudhakara for your reply.
> > > > So if number of mappers depends on the data size ,maybe the best way
> to
> > > do
> > > > my experiments is to increase the number of reducers based on the
> > number
> > > of
> > > > estimated blocks in data file.Actually i want to know how response
> time
> > > is
> > > > changed by changing the number of mappers and reducers.
> > > > Any idea about the way of  doing this kind of experiment?
> > > >
> > > > Samaneh
> > > >
> > > >
> > > > On Sun, Apr 7, 2013 at 6:29 PM, sudhakara st <sudhakara.st@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi Samaneh,
> > > > >
> > > > >             The number of map tasks for a given job is driven by
> the
> > > > number
> > > > > of input splits in the input data. ideally in default
> configurations
> > > >  each
> > > > > input split(for a block) a map task is spawned. So your 2.5G of
> data
> > > > > contains 44 blocks, therefore you jobs taking 44 map task. At
> > minimum,
> > > > with
> > > > > FileInputFormat derivatives, job will have at least one map per
> file
> > > and
> > > > > can have multiple maps per file if they extend beyond a single
> > > block(file
> > > > > size is more that block size). The *mapred.map.tasks* parameter is
> > > just a
> > > > > hint to the InputFormat for the number of maps. its does not have
> any
> > > > > effect if the number blocks in the input date more then specified
> > > value.
> > > > It
> > > > > not possible to specify number mapper need run for a job. But it
> > > possible
> > > > > to explicitly specify  number reduce can run for a job by using *
> > > > > mapred.reduce.tasks* property.
> > > > >
> > > > > The replication factor in not related in any to number of mapper
> and
> > > > > reducer.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Apr 7, 2013 at 7:38 PM, Samaneh Shokuhi
> > > > > <samaneh.shokuhi@gmail.com>wrote:
> > > > >
> > > > > > Hi All,
> > > > > > I am doing some experiments by running WordCount example on
> hadoop.
> > > > > > I have a cluster with 7 nodes .I want to run WordCount example
> with
> > > > > > 3mappers and 3 reducers and compare the response time with
> another
> > > > > > experiments when number of mappers and reducers increased to
6
> and
> > 12
> > > > and
> > > > > > so on.
> > > > > > For first experiment i set number of the mappers and reducer
to 3
> > in
> > > > > > wordCount example source code .and also set the number of
> > > replications
> > > > > to 3
> > > > > > in hadoop configurations.Also  the maximum number of tasks per
> node
> > > is
> > > > > set
> > > > > > to 1 .
> > > > > > But when i run the sample with a big data like 2.5 G ,i can
see
> 44
> > > map
> > > > > > tasks and 3 reduce tasks are running !!
> > > > > >
> > > > > > What parameters do i need to set to have like (3Mappers,3
> > Reducers),
> > > > > > (6M,6R) and (12M,12R) and as i mentioned i have a cluster with
1
> > > > namenode
> > > > > > and 6 datanodes.
> > > > > > Is number of replications related to the number of mappers and
> > > reducers
> > > > > ?!
> > > > > > Regards,
> > > > > > Samaneh
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Regards,
> > > > > .....  Sudhakara.st
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > .....  Sudhakara.st
> > >
> >
>
>
>
> --
>
> Regards,
> .....  Sudhakara.st
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message