From dev-return-11267-apmail-giraph-dev-archive=giraph.apache.org@giraph.apache.org Wed Aug 23 15:53:10 2017 Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9B5631A4AA for ; Wed, 23 Aug 2017 15:53:10 +0000 (UTC) Received: (qmail 27437 invoked by uid 500); 23 Aug 2017 15:53:08 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 26422 invoked by uid 500); 23 Aug 2017 15:53:07 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 26389 invoked by uid 500); 23 Aug 2017 15:53:07 -0000 Delivered-To: apmail-incubator-giraph-dev@incubator.apache.org Received: (qmail 26386 invoked by uid 99); 23 Aug 2017 15:53:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Aug 2017 15:53:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1F65FC01E5 for ; Wed, 23 Aug 2017 15:53:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id dquWXxVUIzzO for ; Wed, 23 Aug 2017 15:53:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 65C7961835 for ; Wed, 23 Aug 2017 15:53:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id CC073E0DF9 for ; Wed, 23 Aug 2017 15:53:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9654D2538A for ; Wed, 23 Aug 2017 15:53:01 +0000 (UTC) Date: Wed, 23 Aug 2017 15:53:01 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (GIRAPH-1139) Resuming from checkpoint doesn't work MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GIRAPH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138529#comment-16138529 ] ASF GitHub Bot commented on GIRAPH-1139: ---------------------------------------- Github user neggert commented on the issue: https://github.com/apache/giraph/pull/30 Ping @edunov > Resuming from checkpoint doesn't work > ------------------------------------- > > Key: GIRAPH-1139 > URL: https://issues.apache.org/jira/browse/GIRAPH-1139 > Project: Giraph > Issue Type: Bug > Components: bsp > Affects Versions: 1.2.0 > Reporter: Nic Eggert > > I ran into a couple of issues when trying to get Giraph to resume from checkpoints (using mapreduce.max.attempts rather than GiraphJobRetryChecker). > * If we just wrote a checkpoint, the master expects the workers to checkpoint again, while the workers (correctly) clear the checkpointing flag. > * When workers restart, they take their task id from the partition number, which stays the same across multiple attempts. This gets transferred to the Netty clientId, and the server starts ignoring messages from restarted workers because it thinks it processed them already. > I believe I've fixed these issues. I'll send a GitHub PR shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)