Can you try the Cassandra connector 1.5? It is also compatible with Spark 1.6 according to their documentation https://github.com/datastax/spark-cassandra-connector#version-compatibility You can also crosspost it over here https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user

On Fri, Jul 1, 2016 at 5:45 PM, Joaquin Alzola <Joaquin.Alzola@lebara.com> wrote:

HI Akhil

 

I am using:

Cassandra: 3.0.5

Spark: 1.6.1

Scala 2.10

Spark-cassandra connector: 1.6.0

 

From: Akhil Das [mailto:akhld@hacked.work]
Sent: 01 July 2016 11:38
To: Joaquin Alzola <Joaquin.Alzola@lebara.com>
Cc: user@spark.apache.org
Subject: Re: Remote RPC client disassociated

 

This looks like a version conflict, which version of spark are you using? The Cassandra connector you are using is for Scala 2.10x and Spark 1.6 version.

 

On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola <Joaquin.Alzola@lebara.com> wrote:

HI List,

 

I am launching this spark-submit job:

 

hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py

 

spark_v2.py is:

from pyspark_cassandra import CassandraSparkContext, Row

from pyspark import SparkContext, SparkConf

from pyspark.sql import SQLContext

conf = SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host", "192.168.23.31")

sc = CassandraSparkContext(conf=conf)

table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")

food_count = table.select("errorcode2001").groupBy("errorcode2001").count()

food_count.collect()

 

 

Error I get when running the above command:

 

[Stage 0:>                                                          (0 + 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

[Stage 0:>                                                          (0 + 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

[Stage 0:>                                                          (0 + 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

[Stage 0:>                                                          (0 + 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; aborting job

Traceback (most recent call last):

  File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module>

    food_count = table.select("errorcode2001").groupBy("errorcode2001").count()

  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in count

  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum

  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in fold

  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect

  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__

  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Driver stacktrace:

        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)

       at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)

        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)

        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)

        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)

        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)

        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)

        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)

        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)

        at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)

        at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)

        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)

        at py4j.Gateway.invoke(Gateway.java:259)

        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)

        at py4j.commands.CallCommand.execute(CallCommand.java:79)

        at py4j.GatewayConnection.run(GatewayConnection.java:209)

        at java.lang.Thread.run(Thread.java:745)

 

 

From the jobs URL spark web (stderr log page for app-20160630104030-0086/4):

 

stderr log page for app-20160630104030-0086/4

16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout writer for python

java.lang.AbstractMethodError: pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;

              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)

              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)

              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)

              at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)

              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

              at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)

              at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)

              at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)

              at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)

              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

              at scala.collection.Iterator$class.foreach(Iterator.scala:727)

              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

              at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)

              at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)

              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)

              at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)

16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for python,5,main]

java.lang.AbstractMethodError: pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;

              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)

              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)

              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)

              at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)

              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

              at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)

              at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)

              at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)

              at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)

              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

              at scala.collection.Iterator$class.foreach(Iterator.scala:727)

              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

              at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)

              at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)

              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)

               at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239) BR

 

Joaquin

This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.



 

--

Cheers!

 

This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.



--
Cheers!