From common-issues-return-207726-apmail-hadoop-common-issues-archive=hadoop.apache.org@hadoop.apache.org Thu Oct 1 13:32:40 2020 Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@locus.apache.org Delivered-To: apmail-hadoop-common-issues-archive@locus.apache.org Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by minotaur.apache.org (Postfix) with ESMTP id 797751A301 for ; Thu, 1 Oct 2020 13:32:36 +0000 (UTC) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id A50F144ABF for ; Thu, 1 Oct 2020 13:32:36 +0000 (UTC) Received: (qmail 16858 invoked by uid 500); 1 Oct 2020 13:32:36 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 16817 invoked by uid 500); 1 Oct 2020 13:32:36 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 16805 invoked by uid 99); 1 Oct 2020 13:32:36 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2020 13:32:36 +0000 From: =?utf-8?q?GitBox?= To: common-issues@hadoop.apache.org Subject: =?utf-8?q?=5BGitHub=5D_=5Bhadoop=5D_steveloughran_commented_on_pull_request_?= =?utf-8?q?=232349=3A_MAPREDUCE-7282=2E_Move_away_from_V2_commit_algorithm?= Message-ID: <160155915591.32230.15535202842052601636.asfpy@gitbox.apache.org> Date: Thu, 01 Oct 2020 13:32:35 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit In-Reply-To: References: steveloughran commented on pull request #2349: URL: https://github.com/apache/hadoop/pull/2349#issuecomment-702137474 @jbrennan333 what do you think we should say instead of deprecated? "not recommended". I was thinking of adding a link to the JIRA and changing the issue text there to clarify * safe if names and content of generated output files is consistent across all task attempts * unsafe if different TAs generate bad files (biggest risk, as partial failure of 1st attempt may leave) * unsafe if different TAs generate different content in same files (only an issue on a network partition and TA #1 generates output as/after TA #2 does its work. cleanup of job will delete the whole job attempt dir so that's the maximum time that a partitioned TA may commit work. There's no risk of some VM pausing for 3 hours, restarting and an in progress TA completing its work and overwriting the final output. This is good. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org