spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joaquin Alzola <Joaquin.Alz...@lebara.com>
Subject RE: Remote RPC client disassociated
Date Fri, 01 Jul 2016 10:45:38 GMT
HI Akhil

I am using:
Cassandra: 3.0.5
Spark: 1.6.1
Scala 2.10
Spark-cassandra connector: 1.6.0

From: Akhil Das [mailto:akhld@hacked.work]
Sent: 01 July 2016 11:38
To: Joaquin Alzola <Joaquin.Alzola@lebara.com>
Cc: user@spark.apache.org
Subject: Re: Remote RPC client disassociated

This looks like a version conflict, which version of spark are you using? The Cassandra connector
you are using is for Scala 2.10x and Spark 1.6 version.

On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola <Joaquin.Alzola@lebara.com<mailto:Joaquin.Alzola@lebara.com>>
wrote:
HI List,

I am launching this spark-submit job:

hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.0
--jars /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py

spark_v2.py is:
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077<http://192.168.23.31:7077>").set("spark.cassandra.connection.host",
"192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()


Error I get when running the above command:

[Stage 0:>                                                          (0 + 3) / 7]16/06/30
10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: Remote RPC client disassociated.
Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN
messages.
[Stage 0:>                                                          (0 + 7) / 7]16/06/30
10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: Remote RPC client disassociated.
Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN
messages.
[Stage 0:>                                                          (0 + 5) / 7]16/06/30
10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: Remote RPC client disassociated.
Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN
messages.
[Stage 0:>                                                          (0 + 4) / 7]16/06/30
10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: Remote RPC client disassociated.
Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN
messages.
16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; aborting job
Traceback (most recent call last):
  File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module>
    food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in count
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in fold
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed
4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 14, as4): ExecutorLostFailure
(executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated.
Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN
messages.
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
       at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
        at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
        at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


From the jobs URL spark web (stderr log page for app-20160630104030-0086/4):

stderr log page for app-20160630104030-0086/4
16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout writer for python
java.lang.AbstractMethodError: pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
              at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
              at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
              at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
              at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
              at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
              at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout
writer for python,5,main]
java.lang.AbstractMethodError: pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
              at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
              at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
              at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
              at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
              at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
               at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the intended recipient,
please do not copy or disclose its content but contact the sender immediately upon receipt.



--
Cheers!

This email is confidential and may be subject to privilege. If you are not the intended recipient,
please do not copy or disclose its content but contact the sender immediately upon receipt.
Mime
View raw message