spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Naour <>
Subject Accumulator and Accumulable vs classic MR
Date Fri, 01 Aug 2014 14:38:20 GMT

My question is simple: could it be some performance issue using
Accumulable/Accumulator instead of method like map() reduce()... ?

My use case : implementation of a clustering algorithm like k-means.
At the begining I used two steps, one to asign data to cluster and another
to calculate new centroids.
After some research I use now an accumulable with an Array to calculate new
centroid during the assigment of data. It's easier to unterstand and for
the moment it gives better performance.
It's probably because I used 2 steps before and now only one thanks to

So any indications against it ?



View raw message