Btw, here is a great article about accumulators and all their related traps! 
http://imranrashid.com/posts/Spark-Accumulators/ (I'm not the author)

On 16 March 2016 at 18:24, swetha kasireddy <swethakasireddy@gmail.com> wrote:
OK. I did take a look at them. So once I have an accumulater for a HashSet, how can I check if a particular key is already present in the HashSet accumulator? I don't see any .contains method there. My requirement is that I need to keep accumulating the keys in the HashSet across all the tasks in various nodes and use it to do a check if the key is already present in the HashSet.

On Tue, Mar 15, 2016 at 9:56 PM, pppsunil <pppsunil@gmail.com> wrote:
Have you looked at using Accumulable interface,  Take a look at Spark
documentation at
http://spark.apache.org/docs/latest/programming-guide.html#accumulators it
gives example of how to use vector type for accumalator, which might be very
close to what you need



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-an-accumulator-for-a-Set-in-Spark-tp26510p26514.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org





--

Adrien Mogenet
Head of Backend/Infrastructure
50, avenue Montaigne - 75008 Paris