spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre B <pierre.borckm...@realimpactanalytics.com>
Subject Nested method in a class: Task not serializable?
Date Fri, 16 May 2014 21:15:55 GMT
Hi!

I understand the usual "Task not serializable" issue that arises when
accessing a field or a method that is out of scope of a closure.

To fix it, I usually define a local copy of these fields/methods, which
avoids the need to serialize the whole class:

class MyClass(val myField: Any) {
  def run() = {
    val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")

    val myField = this.myField
    println(f.map( _ + myField ).count)
  }
}

===================

Now, if I define a nested function in the run method, it cannot be
serialized:
class MyClass() {
  def run() = {
    val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")

    def mapFn(line: String) = line.split(";")

    val myField = this.myField
    println(f.map( mapFn( _ ) ).count)
    
  }
}

I don't understand since I thought "mapFn" would be in scope...
Even stranger, if I define mapFn to be a val instead of a def, then it
works:

class MyClass() {
  def run() = {
    val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")

    val mapFn = (line: String) => line.split(";")
   
    println(f.map( mapFn( _ ) ).count)    
  }
}

Is this related to the way Scala represents nested functions?

What's the recommended way to deal with this issue ?

Thanks for your help,

Pierre



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Nested-method-in-a-class-Task-not-serializable-tp5869.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message