spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Why must the dstream.foreachRDD(...) parameter be serializable?
Date Wed, 28 Jan 2015 02:15:37 GMT
Hi,

I want to do something like

dstream.foreachRDD(rdd => if (someCondition) ssc.stop())

so in particular the function does not touch any element in the RDD and
runs completely within the driver. However, this fails with a
NotSerializableException because $outer is not serializable etc. The
DStream code says:

  def foreachRDD(foreachFunc: (RDD[T], Time) => Unit) {
    // because the DStream is reachable from the outer object here, and
because
    // DStreams can't be serialized with closures, we can't proactively
check
    // it for serializability and so we pass the optional false to
SparkContext.clean
    new ForEachDStream(this, context.sparkContext.clean(foreachFunc,
false)).register()
  }

To be honest, I don't understand the comment. Why must that function be
serializable even when there is no RDD action involved?

Thanks
Tobias

Mime
View raw message