hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <chris.doug...@gmail.com>
Subject Re: Understanding task commit/abort protocol
Date Mon, 13 Feb 2017 20:22:24 GMT
IIRC the call to commit is correct, if the OutputCommitter supports
partial commit. That was the idea: checkpoint the state of the reducer
and promote its output, so it picks up from that point when
rescheduled. It worked well in our experiments.

This also included changes that allow intermediate output to be
written to non-local FileSystems, so the reduce could stream output
instead of first copying/merging locally. It significantly improved
runtimes for small/medium jobs (around 30%), though naively enabling
it could DDoS a NN. It made checkpoints very inexpensive.

Anyway- I'm unsure what to do with this. Augusto Souza picked up the
code a couple years ago, but we couldn't sync up to merge the code
with the encrypted shuffle and polish a couple rough points [1].
There's little risk in merging it, since at worst it's never enabled,
but we wanted to make sure it was hardened enough for production.

Is there interest in this? AFAIK MapReduce jobs are still very common
in production clusters, preemption is (now) often enabled, and it'd be
an interesting example for other YARN applications. However, we don't
have cycles to redo this work on trunk. We can offer review/guidance,
if someone wanted to dig into the MR pipeline and complete it. -C

[1] e.g., writing the task attempt ID into the IFile header. This
anticipated aggregation trees, but in practice is used only for
preemption during the shuffle phase. This isn't wrong, just verbose.

On Wed, Feb 8, 2017 at 6:52 AM, Steve Loughran <stevel@hortonworks.com> wrote:
>> On 3 Feb 2017, at 20:02, Chris Douglas <chris.douglas@gmail.com> wrote:
>> It's been a long time, but IIRC this isn't going to be invoked. The AM
>> will never set the preempt flag in the umbilical, so the task will
>> never transition to this state.
>> MapReduce checkpoint/restart of reduce tasks was going to be part of
>> MAPREDUCE-5269, which signals a ReduceTask to promote its partial
>> output if both the Reducer and OutputCommitter are tagged as
>> @Checkpointable. If either is not, then the flag is never set. The
>> code that would have implemented this was not committed, so it's
>> really-really not going to be set. -C
> I didn't think it was being used, but thanks for clarifying this.
> Should that code snippet be culled? Or at least the abort operation to actually call
>> On Fri, Feb 3, 2017 at 6:41 AM, Steve Loughran <stevel@hortonworks.com> wrote:
>>> In HADOOP-13786 I'm adding a new committer, one which writes to S3 without doing
renames. It does this by submitting all the data to S3 targeted at the final destination,
but doesn't send the POST needed to materialize it until the tasks commits. Abort the task
and it cancels these pending commits.
>>> this algorithm should be robust provided that only one attempt for a task is
committed, which comes down to
>>> 1.  Only those tasks which have succeeded are committed
>>> 2   those tasks which have not succeeded have their pending writes aborted
>>> Which is where I now have a question. In the class org.apache.hadoop.mapred.Task,
OutputCommitter.commitTask() is called when a task is pre-empted:
>>>  public void done(TaskUmbilicalProtocol umbilical,
>>>                   TaskReporter reporter
>>>                   ) throws IOException, InterruptedException {
>>>    updateCounters();
>>>    if (taskStatus.getRunState() == TaskStatus.State.PREEMPTED ) {
>>>      // If we are preempted, do no output promotion; signal done and exit
>>>      committer.commitTask(taskContext);         / * HERE */
>>>      umbilical.preempted(taskId, taskStatus);
>>>      taskDone.set(true);
>>>      reporter.stopCommunicationThread();
>>>      return;
>>>    }
>>> That's despite the line above saying "do no output promotion", and, judging by
its place in the code, looking like it's the handler for task preempted state.
>>> Shouldn't it be doing a task abort here?
>>> I suspect the sole reason this hasn't shown up as a problem before is that this
is the sole use of TaskStatus.State.PREEMPTED in the hadoop code: this particular codepath
is never executed. In which case, culling it may be correct option.
>>> Thoughts?
>>> -Steve
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

View raw message