spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jw.cmu" <jinliangw...@gmail.com>
Subject PySpark crashed because "remote RPC client disassociated"
Date Wed, 29 Jun 2016 21:52:37 GMT
I am running my own PySpark application (solving matrix factorization using
Gemulla's DSGD algorithm). The program seemed to work fine on smaller
movielens dataset but failed on larger Netflix data. It too about 14 hours
to complete two iterations and lost an executor (I used totally 8 executors
all on one 16-core machine) because "remote RPC client disassociated".

Below is the full error message. I would appreciate any pointer on debugging
this problem. Thanks!

16/06/29 12:43:50 WARN TaskSetManager: Lost task 7.0 in stage 2581.0 (TID
9304, no139.nome.nx): TaskKilled (killed intentionally)
py4j.protocol.Py4JJavaError16/06/29 12:43:53 WARN TaskSetManager: Lost task
6.0 in stage 2581.0 (TID 9303, no139.nome.nx): TaskKilled (killed
intentionally)
16/06/29 12:43:53 WARN TaskSetManager: Lost task 2.0 in stage 2581.0 (TID
9299, no139.nome.nx): TaskKilled (killed intentionally)
16/06/29 12:43:53 INFO TaskSchedulerImpl: Removed TaskSet 2581.0, whose
tasks have all completed, from pool
: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3
in stage 2581.0 failed 1 times, most recent failure: Lost task 3.0 in stage
2581.0 (TID 9300, no139.nome.nx): ExecutorLostFailure (executor 5 exited
caused by one of the running tasks) Reason: Remote RPC client disassociated.
Likely due to containers exceeding thresholds, or network issues. Check
driver logs for WARN messages.
Driver stacktrace:
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
        at
org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
        at
org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
        at sun.reflect.GeneratedMethodAccessor96.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-crashed-because-remote-RPC-client-disassociated-tp27248.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message