spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Segerlind, Nathan L" <>
Subject RDD.aggregate versus accumulables...
Date Mon, 17 Nov 2014 03:06:39 GMT
Hi All.

I am trying to get my head around why using accumulators and accumulables seems to be the
most recommended method for accumulating running sums, averages, variances and the like, whereas
the aggregate method seems to me to be the right one. I have no performance measurements as
of yet, but it seems that aggregate is simpler and more intuitive (And it does what one might
expect an accumulator to do) whereas the accumulators and accumulables seem to have some extra
complications and overhead.


What's the real difference between an accumulator/accumulable and aggregating an RDD? When
is one method of aggregation preferred over the other?


View raw message