spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvio Fiorito <silvio.fior...@granturing.com>
Subject Re: "collecting" DStream data
Date Sun, 15 May 2016 12:04:36 GMT
Hi Daniel,

Given your example, “arr” is defined on the driver, but the “foreachRDD” function
is run on the executors. If you want to collect the results of the RDD/DStream down to the
driver you need to call RDD.collect. You have to be careful though that you have enough memory
on the driver JVM to hold the results, otherwise you’ll have an OOM exception. Also, you
can’t update the value of a broadcast variable, since it’s immutable.

Thanks,
Silvio

From: Daniel Haviv <daniel.haviv@veracity-group.com<mailto:daniel.haviv@veracity-group.com>>
Date: Sunday, May 15, 2016 at 6:23 AM
To: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: "collecting" DStream data

Hi,
I have a DStream I'd like to collect and broadcast it's values.
To do so I've created a mutable HashMap which i'm filling with foreachRDD but when I'm checking
it, it remains empty. If I use ArrayBuffer it works as expected.

This is my code:

val arr = scala.collection.mutable.HashMap.empty[String,String]
MappedVersionsToBrodcast.foreachRDD(r=> { r.foreach(r=> { arr+=r})   } )


What am I missing here?

Thank you,
Daniel

Mime
View raw message