spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Jay <>
Subject Re: Spark Streaming RDD transformation
Date Thu, 26 Jun 2014 20:19:45 GMT
Thanks, Sean!

I am currently using foreachRDD to update the global map using data in each
RDD. The reason I want to return a map as RDD instead of just updating the
map is that RDD provides many handy methods for output. For example, I want
to save the global map into files in HDFS for each batch in the stream. In
this case, do you have any suggestion how Spark can easily allow me to do
that? Thanks!

On Thu, Jun 26, 2014 at 12:26 PM, Sean Owen <> wrote:

> If you want to transform an RDD to a Map, I assume you have an RDD of
> pairs. The method collectAsMap() creates a Map from the RDD in this
> case.
> Do you mean that you want to update a Map object using data in each
> RDD? You would use foreachRDD() in that case. Then you can use
> RDD.foreach to do something like update a global Map object.
> Not sure if this is what you mean but SparkContext.parallelize() can
> be used to make an RDD from a List or Array of objects. But that's not
> really related to streaming or updating a Map.
> On Thu, Jun 26, 2014 at 1:40 PM, Bill Jay <>
> wrote:
> > Hi all,
> >
> > I am current working on a project that requires to transform each RDD in
> a
> > DStream to a Map. Basically, when we get a list of data in each batch, we
> > would like to update the global map. I would like to return the map as a
> > single RDD.
> >
> > I am currently trying to use the function transform. The output will be a
> > RDD of the updated map after each batch. How can I create an RDD from
> > another data structure such as Int, Map, ect. Thanks!
> >
> > Bill

View raw message