giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhiwei Gu (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-154) Worker ports are not synched properly with its peers
Date Wed, 14 Mar 2012 20:22:37 GMT
Worker ports are not synched properly with its peers
----------------------------------------------------

                 Key: GIRAPH-154
                 URL: https://issues.apache.org/jira/browse/GIRAPH-154
             Project: Giraph
          Issue Type: Bug
          Components: bsp
    Affects Versions: 0.2.0
            Reporter: Zhiwei Gu
            Assignee: Zhiwei Gu


When worker trying multiple ports to setup the rpc server, the final port is not synched with
it's peer workers properly, and resulted in peer workers send message to the default port.

Here is some logs:

############################################################################
Base port: 34900
############################################################################

############################################################################
log for worker 161:
############################################################################
IPC Server handler 98 on 36061: starting
BasicRPCCommunications: Started RPC communication server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:36061
with 100 handlers and 199 flush threads on bind attempt 1
IPC Server handler 99 on 36061: starting
setup: Registering health of this worker...
getJobState: Job state already exists (/_hadoopBsp/job_201203130609_14838/_masterJobState)
getApplicationAttempt: Node /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already
exists!
getApplicationAttempt: Node /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already
exists!
registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta32085.tan.ygrid.yahoo.com_161
and workerInfo= Worker(hostname=gsta32085.tan.ygrid.yahoo.com, MRpartition=161, port=35061)
process: partitionAssignmentsReadyChanged (partitions are assigned)
startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range
assignments are done in /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_partitionAssignments
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
0 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
1 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
2 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
3 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
4 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
5 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
6 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
7 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
8 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
9 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
10 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
11 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
12 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
13 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
14 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
15 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
16 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
17 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
18 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
19 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
20 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
21 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
22 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
23 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
24 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
25 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
26 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
27 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
28 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
29 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
30 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
31 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
32 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
33 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
34 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
35 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
36 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
37 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
38 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
39 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
40 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
41 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
42 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
43 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
44 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
45 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
46 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
47 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
48 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried
49 time(s).
PriviledgedActionException as:job_201203130609_14838 (auth:SIMPLE) cause:java.net.ConnectException:
Call to gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061 failed on connection exception:
java.net.ConnectException: Connection refused
connectAllRPCProxys: Failed on attempt 0 of 5 to connect to (id=33,cur=Worker(hostname=gsta32085.tan.ygrid.yahoo.com,
MRpartition=161, port=35061),prev=null,ckpt_file=null)
java.net.ConnectException: Call to gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061 failed
on connection exception: java.net.ConnectException: Connection refused
	at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
	at org.apache.hadoop.ipc.Client.call(Client.java:1071)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
	at $Proxy8.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:420)
	at org.apache.giraph.comm.RPCCommunications$1.run(RPCCommunications.java:159)
	at org.apache.giraph.comm.RPCCommunications$1.run(RPCCommunications.java:155)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
	at org.apache.giraph.comm.RPCCommunications.getRPCProxy(RPCCommunications.java:153)
	at org.apache.giraph.comm.RPCCommunications.getRPCProxy(RPCCommunications.java:51)
	at org.apache.giraph.comm.BasicRPCCommunications.startPeerConnectionThread(BasicRPCCommunications.java:599)
	at org.apache.giraph.comm.BasicRPCCommunications.connectAllRPCProxys(BasicRPCCommunications.java:542)
	at org.apache.giraph.comm.BasicRPCCommunications.setup(BasicRPCCommunications.java:513)
	at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:550)
	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
	at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
	at org.apache.hadoop.ipc.Client.call(Client.java:1046)
	... 25 more


############################################################################
log for worker 154
############################################################################
PriviledgedActionException as:job_201203130609_14838 (auth:SIMPLE) cause:java.net.ConnectException:
Call to gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061 failed on connection exception:
java.net.ConnectException: Connection refused
connectAllRPCProxys: Failed on attempt 4 of 5 to connect to (id=33,cur=Worker(hostname=gsta32085.tan.ygrid.yahoo.com,
MRpartition=161, port=35061),prev=null,ckpt_file=null)
java.net.ConnectException: Call to gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061 failed
on connection exception: java.net.ConnectException: Connection refused
	at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
	at org.apache.hadoop.ipc.Client.call(Client.java:1071)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
	at $Proxy8.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:420)
	at org.apache.giraph.comm.RPCCommunications$1.run(RPCCommunications.java:159)
	at org.apache.giraph.comm.RPCCommunications$1.run(RPCCommunications.java:155)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
	at org.apache.giraph.comm.RPCCommunications.getRPCProxy(RPCCommunications.java:153)
	at org.apache.giraph.comm.RPCCommunications.getRPCProxy(RPCCommunications.java:51)
	at org.apache.giraph.comm.BasicRPCCommunications.startPeerConnectionThread(BasicRPCCommunications.java:599)
	at org.apache.giraph.comm.BasicRPCCommunications.connectAllRPCProxys(BasicRPCCommunications.java:542)
	at org.apache.giraph.comm.BasicRPCCommunications.setup(BasicRPCCommunications.java:513)
	at org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:550)
	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
	at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
	at org.apache.hadoop.ipc.Client.call(Client.java:1046)
	... 25 more



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message