hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [hadoop] steveloughran commented on pull request #2349: MAPREDUCE-7282. Move away from V2 commit algorithm
Date Thu, 01 Oct 2020 13:32:35 GMT

steveloughran commented on pull request #2349:
URL: https://github.com/apache/hadoop/pull/2349#issuecomment-702137474

   @jbrennan333 what do you think we should say instead of deprecated? "not recommended".

   I was thinking of adding a link to the JIRA and changing the issue text there to clarify
   * safe if names and content of generated output files is consistent across all task attempts
   * unsafe if different TAs generate bad files (biggest risk, as partial failure of 1st attempt
may leave)
   * unsafe if different TAs generate different content in same files (only an issue on a
network partition and TA #1 generates output as/after TA #2 does its work.
   cleanup of job will delete the whole job attempt dir so that's the maximum time that a
partitioned TA may commit work. There's no risk of some VM pausing for 3 hours, restarting
and an in progress TA completing its work and overwriting the final output. This is good.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message