giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo (JIRA)" <>
Subject [jira] [Resolved] (GIRAPH-1077) Jobs getting stuck after channel failure
Date Thu, 23 Jun 2016 15:46:16 GMT


Maja Kabiljo resolved GIRAPH-1077.
    Resolution: Fixed

> Jobs getting stuck after channel failure
> ----------------------------------------
>                 Key: GIRAPH-1077
>                 URL:
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
> When a channel fails currently we just log the failure. Since we don't wait on open requests
from every place, checking requests doesn't get called always, and we've seen issues with
jobs staying stuck, for example during the input stage when request for split to read from
worker to master fails. When we know that channel failed, we should try to resend the requests
from that channel.

This message was sent by Atlassian JIRA

View raw message