spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Pattern Matching / Equals on Case Classes in Spark Not Working
Date Mon, 12 Jan 2015 18:53:43 GMT
Is this in the Spark shell? Case classes don't work correctly in the Spark shell unfortunately
(though they do work in the Scala shell) because we change the way lines of code compile to
allow shipping functions across the network. The best way to get case classes in there is
to compile them into a JAR and then add that to your spark-shell's classpath with --jars.

Matei

> On Jan 12, 2015, at 10:04 AM, Rosner, Frank (Allianz SE) <FRANK.ROSNER@ALLIANZ.COM>
wrote:
> 
> Dear Spark Users,
>  
> I googled the web for several hours now but I don't find a solution for my problem. So
maybe someone from this list can help.
>  
> I have an RDD of case classes, generated from CSV files with Spark. When I used the distinct
operator, there were still duplicates. So I investigated and found out that the equals returns
false although the two objects were equal (so were their individual fields as well as toStrings).
>  
> After googling it I found that the case class equals might break in case the two objects
are created by different class loaders. So I implemented my own equals method using mattern
matching (code example below). It still didn't work. Some debugging revealed that the problem
lies in the pattern matching. Depending on the objects I compare (and maybe the split / classloader
they are generated in?) the patternmatching works /doesn't:
>  
> case class Customer(id: String, age: Option[Int], entryDate: Option[java.util.Date])
{
>   def equals(that: Any): Boolean = that match {
>     case Customer(id, age, entryDate) => {
>       println("Pattern matching worked!")
>       this.id == id && this.age == age && this.entryDate == entryDate
>     }
>     case _ => false
>   }
> }
>  
> //val x: Array[Customer]
> // ... some spark code to filter original data and collect x
>  
> scala> x(0)
> Customer("a", Some(5), Some(Fri Sep 23 00:00:00 CEST 1994))
> scala> x(1)
> Customer("a", None, None)
> scala> x(2)
> Customer("a", None, None)
> scala> x(3)
> Customer("a", None, None)
>  
> scala> x(0) == x(0) // should be true and works
> Pattern matching works!
> res0: Boolean = true
> scala> x(0) == x(1) // should be false and works
> Pattern matching works!
> res1: Boolean = false
> scala> x(1) == x(2) // should be true, does not work
> res2: Boolean = false
> scala> x(2) == x(3) // should be true, does not work
> Pattern matching works!
> res3: Boolean = true
> scala> x(0) == x(3) // should be false, does not work
> res4: Boolean = false
>  
> Why is the pattern matching not working? It seems that there are two kinds of Customers:
0,1 and 2,3 which don't match somehow. Is this related to some classloaders? Is there a way
around this other than using instanceof and defining a custom equals operation for every case
class I write?
>  
> Thanks for the help!
> Frank


Mime
View raw message