spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Nolet <cjno...@gmail.com>
Subject Accumulators
Date Thu, 15 Jan 2015 02:19:15 GMT
What are the limitations of using Accumulators to get a union of a bunch of
small sets?

Let's say I have an RDD[Map{String,Any} and i want to do:

rdd.map(accumulator += Set(_.get("entityType").get))


What implication does this have on performance? I'm assuming it's not
immediately aggregating each time I call the += on the Accumulator. Is it
doing a local combine and then occasionally sending the results on the
current partition back to the driver?

Mime
View raw message