spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Equivalent of collect() on DStream
Date Thu, 15 May 2014 10:54:30 GMT
Are you not just looking for the foreachRDD() method on DStream?
http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html#output-operations

It gives you an RDD that you can do what you want with, including collect() it.

On Thu, May 15, 2014 at 5:33 AM, Stephen Boesch <javadba@gmail.com> wrote:
> Looking further it appears the functionality I am seeking is in the
> following private[spark]  class ForEachdStream
>
> (version 0.8.1 , yes we are presently using an older release..)
>
> private[streaming]
> class ForEachDStream[T: ClassManifest] (
>     parent: DStream[T],
>     foreachFunc: (RDD[T], Time) => Unit
>   ) extends DStream[Unit](parent.ssc) {
>
> I would like to have access to this structure - particularly the ability to
> define an "foreachFunc" that gets applied to each RDD within the DStream.
> Is there a means to do so?
>
>
>
> 2014-05-14 21:25 GMT-07:00 Stephen Boesch <javadba@gmail.com>:
>>
>>
>> Given that collect() does not exist on DStream apparently my mental model
>> of Streaming RDD (DStream) needs correction/refinement.  So what is the
>> means to convert DStream data into a JVM in-memory representation.  All of
>> the methods on DStream i.e. filter, map, transform, reduce, etc generate
>> other DStream's, and not an in memory data structure.
>>
>>
>>
>

Mime
View raw message