spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Re: SPARK LIMITATION - more than one case class is not allowed !!
Date Fri, 05 Dec 2014 06:12:31 GMT
Rahul,

On Fri, Dec 5, 2014 at 2:50 PM, Rahul Bindlish <
rahul.bindlish@nectechnologies.in> wrote:
>
> I have done so thats why spark is able to load objectfile [e.g. person_obj]
> and spark has maintained serialVersionUID [person_obj].
>
> Next time when I am trying to load another objectfile [e.g. office_obj] and
> I think spark is matching serialVersionUID [person_obj] with previous
> serialVersionUID [person_obj] and giving mismatch error.
>
> In my first post, I have give statements which can be executed easily to
> replicate this issue.
>

Can you post the Scala source for your case classes? I have tried the
following in spark-shell:

case class Dog(name: String)
case class Cat(age: Int)
val dogs = sc.parallelize(Dog("foo") :: Dog("bar") :: Nil)
val cats = sc.parallelize(Cat(1) :: Cat(2) :: Nil)
dogs.saveAsObjectFile("test_dogs")
cats.saveAsObjectFile("test_cats")

This gives two directories "test_dogs/" and "test_cats/". Then I restarted
spark-shell and entered:

case class Dog(name: String)
case class Cat(age: Int)
val dogs = sc.objectFile("test_dogs")
val cats = sc.objectFile("test_cats")

I don't get an exception, but:

dogs: org.apache.spark.rdd.RDD[Nothing] = FlatMappedRDD[1] at objectFile at
<console>:12

Trying to access the elements of the RDD gave:

scala> dogs.collect()
14/12/05 15:08:58 INFO FileInputFormat: Total input paths to process : 8
...
org.apache.spark.SparkDriverExecutionException: Execution error
at
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:980)
...
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ArrayStoreException: [Ljava.lang.Object;
at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
at
org.apache.spark.SparkContext$$anonfun$runJob$3.apply(SparkContext.scala:1129)
...
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:976)
... 10 more

So even in the simplest of cases, this doesn't work for me in the
spark-shell, but with a different error. I guess we need to see more of
your code to help.

Tobias

Mime
View raw message