spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kronenfeld <>
Subject Another accumulator question
Date Fri, 21 Nov 2014 04:46:53 GMT
I think I understand what is going on here, but I was hoping someone could
confirm (or explain reality if I don't) what I'm seeing.

We are collecting data using a rather sizable accumulator - essentially, an
array of tens of thousands of entries.  All told, about 1.3m of data.

If I understand things correctly, it looks to me like, when our job is
done, a copy of this array is retrieved from each individual task, all at
once, for combination on the client - which means, with 400 tasks to the
job, each collection is using up half a gig of memory on the client.

Is this true?  If so, does anyone know a way to get accumulators to
accumulate as results collect, rather than all at once at the end, so we
only have to hold a few in memory at a time, rather than all 400?


Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238

View raw message