spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Darabos <daniel.dara...@lynxanalytics.com>
Subject Re: SPARK LIMITATION - more than one case class is not allowed !!
Date Fri, 05 Dec 2014 13:56:53 GMT
On Fri, Dec 5, 2014 at 7:12 AM, Tobias Pfeiffer <tgp@preferred.jp> wrote:

> Rahul,
>
> On Fri, Dec 5, 2014 at 2:50 PM, Rahul Bindlish <
> rahul.bindlish@nectechnologies.in> wrote:
>>
>> I have done so thats why spark is able to load objectfile [e.g.
>> person_obj]
>> and spark has maintained serialVersionUID [person_obj].
>>
>> Next time when I am trying to load another objectfile [e.g. office_obj]
>> and
>> I think spark is matching serialVersionUID [person_obj] with previous
>> serialVersionUID [person_obj] and giving mismatch error.
>>
>> In my first post, I have give statements which can be executed easily to
>> replicate this issue.
>>
>
> Can you post the Scala source for your case classes? I have tried the
> following in spark-shell:
>
> case class Dog(name: String)
> case class Cat(age: Int)
> val dogs = sc.parallelize(Dog("foo") :: Dog("bar") :: Nil)
> val cats = sc.parallelize(Cat(1) :: Cat(2) :: Nil)
> dogs.saveAsObjectFile("test_dogs")
> cats.saveAsObjectFile("test_cats")
>
> This gives two directories "test_dogs/" and "test_cats/". Then I restarted
> spark-shell and entered:
>
> case class Dog(name: String)
> case class Cat(age: Int)
> val dogs = sc.objectFile("test_dogs")
> val cats = sc.objectFile("test_cats")
>
> I don't get an exception, but:
>
> dogs: org.apache.spark.rdd.RDD[Nothing] = FlatMappedRDD[1] at objectFile
> at <console>:12
>

You need to specify the type of the RDD. The compiler does not know what is
in "test_dogs".

val dogs = sc.objectFile[Dog]("test_dogs")
val cats = sc.objectFile[Cat]("test_cats")

It's an easy mistake to make... I wonder if an assertion could be
implemented that makes sure the type parameter is present.

Mime
View raw message