hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Payne <eric.payne1...@yahoo.com.INVALID>
Subject Re: Submit, suspend and resume a mapreduce job execution
Date Mon, 22 Aug 2016 16:11:14 GMT
If the only thing you want to do is make sure the reducers don't start until all of the maps
are done, you could set mapreduce.job.reduce.slowstart.completedmaps to 1.0. By default, it
is 0.05. This property defines the fraction of the number of maps in the job which should
be complete before reduces are scheduled for the job.

If you want to have a job that does something between the maps and the reducers, the TEZ framework
may work for you. I'm not extremely familiar with TEZ, but I know that it does allow arbitrary
DAGs to be defined. It would require some amount of work to port from MapRed to TEZ, though.

Hope that helps.
-Eric Payne

----- Original Message -----
From: xeon Mailinglist <xeonmailinglist@gmail.com>
To: dev@slider.incubator.apache.org; mapreduce-dev@hadoop.apache.org
Sent: Sunday, August 21, 2016 5:50 AM
Subject: Re: Submit, suspend and resume a mapreduce job execution

I know that it is not possible to suspend and resume mapreduce job, but I
really need to find a workaround. I have looked to the ChainedJobs and to
the CapacityScheduler, but I am really clueless on what to do.

The main goal was to suspend a job when the map tasks finish and the reduce
tasks start. I know that this is not possible, so I have created to jobs.
One that execute all the map tasks (Job 1), and another job that execute
all the reduce tasks (Job 2). Since I can't start a job with just running
reduce tasks, it was necessary to add an identity mapper before running the
reducers. So in the end, I have Job 1 that just executes all map tasks, and
job 2 that executes the identity mappers and the reduce tasks. But this
really kills performance. I wish I could find a way to obtain better
performance. I have thought in doing pipe of the output of Job 1 to Job 2,
but in the end I really need to stop the execution between these 2 jobs.

I have looked to the ChainedJobs and CapacityScheduler classes to see if I
could implement a way to suspend and resume a job, but I didn't do nothing
successfully. Any idea to emulate a way to suspend a job?

Sorry to say this, but I am really desperate in finding a solution.


On Wed, Feb 18, 2015 at 6:53 PM, Steve Loughran <stevel@hortonworks.com>

> Afraid not.
> When we suspend/resume a slider application, what we are doing is shutting
> down the entire application, releasing all its YARN resources and killing
> the "Application Master". The  MapReduce engine runs its AM for the
> duration of the job; building up lots of state in that AM as to what is
> happening. Tez runs for longer, but it can dynamically change cluster size
> based on load.
> "Hadoop pre-emption" is a mechanism by which your cluster can be set up so
> that higher priority workloads can cause containers of lower-priority jobs
> to get killed, "pre-empted". Maybe that could be useful.
> -Steve
> On 18 February 2015 at 17:22:57, xeonmailinglist (
> xeonmailinglist@gmail.com<mailto:xeonmailinglist@gmail.com>) wrote:
> Hi,
> I noticed that YARN does not suspend or resume a mapreduce job that it
> is executing. Then, I have found Apache Slider.
> Is it possible to submit a mapreduce job with slider, and suspend and
> resume the job while executing?
> Thanks,

To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

View raw message