spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop Shiralige <anoop.shiral...@gmail.com>
Subject Unexpected element type class
Date Sun, 07 Feb 2016 13:14:49 GMT
Hi All,

I have written some functions in scala, which I want to expose in pyspark
(interactively, spark - 1.6.0). 
The scala function(loadAvro) writtens a JavaRDD[AvroGenericRecord].
AvroGenericRecord is my wrapper class over the
/org.apache.avro.generic.GenericRecord/. I am trying to convert this JavaRDD
into pyspark RDD and from there on try and use the APIs available. But, I
get that "Unexpected element type class exception". what can i do to resolve
this ?! 

Thanks,
AnoopShiralige

*Below is the code :*


avroData = sc._jvm.dummy.commons.avro.AvroParser.loadAvro("<path to avro
files>", sc._jsc)
from pyspark.rdd import RDD
pythonRDD=RDD(avroData,sc)
pythonRDD.collect()


*loadAvro definition* : 
def loadAvro(path: String, sc: org.apache.spark.api.java.JavaSparkContext):
JavaRDD[(AvroGenericRecord)]

*Error I am getting : *

16/02/07 04:06:45 INFO mapreduce.AvroKeyInputFormat: Using a reader schema
equal to the writer schema.
16/02/07 04:06:45 ERROR python.PythonRunner: Python worker exited
unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/disk2/spark6/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/worker.py",
line 98, in main
    command = pickleSer._read_with_length(infile)
  File
"/disk2/spark6/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/serializers.py",
line 156, in _read_with_length
    length = read_int(stream)
  File
"/disk2/spark6/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/serializers.py",
line 545, in read_int
    raise EOFError
EOFError

        at
org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
        at
org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
        at
org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Unexpected element type class
dummy.commons.avro.AvroGenericRecord
        at
org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:449)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
        at
org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
        at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
        at
org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
16/02/07 04:06:45 ERROR python.PythonRunner: This may have been caused by a
prior exception:
org.apache.spark.SparkException: Unexpected element type class
dummy.commons.avro.AvroGenericRecord
        at
org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:449)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
        at
org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
        at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
        at
org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
16/02/07 04:06:45 ERROR executor.Executor: Exception in task 0.0 in stage
1.0 (TID 1)
org.apache.spark.SparkException: Unexpected element type class
dummy.commons.avro.AvroGenericRecord
        at
org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:449)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
        at
org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
        at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
        at
org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
16/02/07 04:06:45 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0
(TID 1, localhost): org.apache.spark.SparkException: Unexpected element type
class dummy.commons.avro.AvroGenericRecord
        at
org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:449)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
        at
org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
        at
org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
        at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
        at
org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unexpected-element-type-class-tp26169.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message