spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Defazio (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-6520) Kyro serialization broken in the shell
Date Wed, 25 Mar 2015 01:49:54 GMT

     [ https://issues.apache.org/jira/browse/SPARK-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Defazio updated SPARK-6520:
---------------------------------
    Description: 
If I start spark as follows:
{quote}
~/spark-1.3.0-bin-hadoop2.4/bin/spark-shell --master local[1] --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
{quote}

Then using :paste, run 
{quote}
    case class Example(foo : String, bar : String)
    val ex = sc.parallelize(List(Example("foo1", "bar1"), Example("foo2", "bar2"))).collect()
{quote}

I get the error:
{quote}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed
1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException:
com.esotericsoftware.kryo.KryoException: Error constructing instance of class: $line3.$read
Serialization trace:
$VAL10 ($iwC)
$outer ($iwC$$iwC)
$outer ($iwC$$iwC$Example)
  at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140)
  at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:979)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1873)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1895)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
  at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
  at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
{quote}

As far as I can tell, when using :paste, Kyro serialization doesn't work for classes defined
in within the same paste. It does work when the statements are entered without paste.

This issue seems serious to me, since Kyro serialization is virtually mandatory for performance
(20x slower with default serialization on my problem), and I'm assuming feature parity between
spark-shell and spark-submit is a goal.
Note that this is different from SPARK-6497, which covers the case when Kyro is set to require
registration.

  was:
If I start spark as follows:
{quote}
~/spark-1.3.0-bin-hadoop2.4/bin/spark-shell --master local[1] --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
{quote}

Then using :paste, run 
{quote}
    case class Example(foo : String, bar : String)
    val ex = sc.parallelize(List(Example("foo1", "bar1"), Example("foo2", "bar2"))).collect()
{quote}

I get the error:
{quote}
$VAL10 ($iwC)
$outer ($iwC$$iwC)
$outer ($iwC$$iwC$Example)
  at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140)
  at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:979)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1873)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1895)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
  at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
  at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
{quote}

As far as I can tell, when using :paste, Kyro serialization doesn't work for classes defined
in within the same paste. It does work when the statements are entered without paste.

This issue seems serious to me, since Kyro serialization is virtually mandatory for performance
(20x slower with default serialization on my problem), and I'm assuming feature parity between
spark-shell and spark-submit is a goal.
Note that this is different from SPARK-6497, which covers the case when Kyro is set to require
registration.


> Kyro serialization broken in the shell
> --------------------------------------
>
>                 Key: SPARK-6520
>                 URL: https://issues.apache.org/jira/browse/SPARK-6520
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: Aaron Defazio
>
> If I start spark as follows:
> {quote}
> ~/spark-1.3.0-bin-hadoop2.4/bin/spark-shell --master local[1] --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
> {quote}
> Then using :paste, run 
> {quote}
>     case class Example(foo : String, bar : String)
>     val ex = sc.parallelize(List(Example("foo1", "bar1"), Example("foo2", "bar2"))).collect()
> {quote}
> I get the error:
> {quote}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0
failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException:
com.esotericsoftware.kryo.KryoException: Error constructing instance of class: $line3.$read
> Serialization trace:
> $VAL10 ($iwC)
> $outer ($iwC$$iwC)
> $outer ($iwC$$iwC$Example)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140)
>   at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:979)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1873)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1895)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
>   at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
>   at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> {quote}
> As far as I can tell, when using :paste, Kyro serialization doesn't work for classes
defined in within the same paste. It does work when the statements are entered without paste.
> This issue seems serious to me, since Kyro serialization is virtually mandatory for performance
(20x slower with default serialization on my problem), and I'm assuming feature parity between
spark-shell and spark-submit is a goal.
> Note that this is different from SPARK-6497, which covers the case when Kyro is set to
require registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message