spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Boesch <>
Subject Re: Equivalent of collect() on DStream
Date Thu, 15 May 2014 04:33:38 GMT
Looking further it appears the functionality I am seeking is in the
following *private[spark] * class ForEachdStream

(version 0.8.1 , yes we are presently using an older release..)

class ForEachDStream[T: ClassManifest] (
    parent: DStream[T],
*    foreachFunc: (RDD[T], Time) => Unit*
  ) extends DStream[Unit](parent.ssc) {

I would like to have access to this structure - particularly the ability to
define an "foreachFunc" that gets applied to each RDD within the DStream.
 Is there a means to do so?

2014-05-14 21:25 GMT-07:00 Stephen Boesch <>:

> Given that collect() does not exist on DStream apparently my mental model
> of Streaming RDD (DStream) needs correction/refinement.  So what is the
> means to convert DStream data into a JVM in-memory representation.  All of
> the methods on DStream i.e. filter, map, transform, reduce, etc generate
> other DStream's, and not an in memory data structure.

View raw message