giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-1077) Jobs getting stuck after channel failure
Date Tue, 21 Jun 2016 23:28:57 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343028#comment-15343028
] 

Hudson commented on GIRAPH-1077:
--------------------------------

FAILURE: Integrated in Giraph-trunk-Commit #1626 (See [https://builds.apache.org/job/Giraph-trunk-Commit/1626/])
GIRAPH-1077: Jobs getting stuck after channel failure (majakabiljo: [http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=51f09376456ed8dadc2e801afaa495863fd7ee3b])
* giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java


> Jobs getting stuck after channel failure
> ----------------------------------------
>
>                 Key: GIRAPH-1077
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1077
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>
> When a channel fails currently we just log the failure. Since we don't wait on open requests
from every place, checking requests doesn't get called always, and we've seen issues with
jobs staying stuck, for example during the input stage when request for split to read from
worker to master fails. When we know that channel failed, we should try to resend the requests
from that channel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message