spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ulanov, Alexander" <alexander.ula...@hp.com>
Subject Pass parameters to RDD functions
Date Thu, 03 Jul 2014 11:24:35 GMT
Hi,

I wonder how I can pass parameters to RDD functions with closures. If I do it in a following
way, Spark crashes with NotSerializableException:

class TextToWordVector(csvData:RDD[Array[String]]) {

  val n = 1
  lazy val x = csvData.map{ stringArr => stringArr(n)}.collect()
}

Exception:
Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException:
org.apache.spark.mllib.util.TextToWordVector
org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable:
java.io.NotSerializableException: org.apache.spark.mllib.util.TextToWordVector
                at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1038)


This message proposes a workaround, but it didn't work for me:
http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3CCAA_qdLrxXzwXd5=6SXLOgSmTTorpOADHjnOXn=tMrOLEJM=Frw@mail.gmail.com%3E

What is the best practice?

Best regards, Alexander

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message