giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo (JIRA)" <>
Subject [jira] [Created] (GIRAPH-1077) Jobs getting stuck after channel failure
Date Tue, 21 Jun 2016 18:53:57 GMT
Maja Kabiljo created GIRAPH-1077:

             Summary: Jobs getting stuck after channel failure
                 Key: GIRAPH-1077
             Project: Giraph
          Issue Type: Bug
            Reporter: Maja Kabiljo
            Assignee: Maja Kabiljo

When a channel fails currently we just log the failure. Since we don't wait on open requests
from every place, checking requests doesn't get called always, and we've seen issues with
jobs staying stuck, for example during the input stage when request for split to read from
worker to master fails. When we know that channel failed, we should try to resend the requests
from that channel.

This message was sent by Atlassian JIRA

View raw message