giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Edunov (JIRA)" <>
Subject [jira] [Created] (GIRAPH-972) Race condition in checkpointing
Date Thu, 18 Dec 2014 18:50:13 GMT
Sergey Edunov created GIRAPH-972:

             Summary: Race condition in checkpointing
                 Key: GIRAPH-972
             Project: Giraph
          Issue Type: Bug
            Reporter: Sergey Edunov

Couple of issues noticed with checkpointing of large jobs:
1) Task ID of master appears to be important. In most cases it is 0, however sometimes it
is not and as we can not control it checkpointing should not depend on it.

2) Race condition happens on master when worker dies:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /_hadoopBsp/job_201411061513.38895_0001/_applicationAttemptsDir/0/_superstepDir/9/_workerHealthyDir/hadoop4921.prn2.facebook.com_3
	at org.apache.zookeeper.KeeperException.create(
	at org.apache.zookeeper.KeeperException.create(
	at org.apache.zookeeper.ZooKeeper.getData(
	at org.apache.zookeeper.ZooKeeper.getData(
	at org.apache.giraph.zk.ZooKeeperExt.getData(
	at org.apache.giraph.utils.WritableUtils.readFieldsFromZnode(

This message was sent by Atlassian JIRA

View raw message