spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Another accumulator question
Date Fri, 21 Nov 2014 09:44:46 GMT
This sounds more like a use case for reduce? or fold? it sounds like
you're kind of cobbling together the same function on accumulators,
when reduce/fold are simpler and have the behavior you suggest.

On Fri, Nov 21, 2014 at 5:46 AM, Nathan Kronenfeld
<nkronenfeld@oculusinfo.com> wrote:
> I think I understand what is going on here, but I was hoping someone could
> confirm (or explain reality if I don't) what I'm seeing.
>
> We are collecting data using a rather sizable accumulator - essentially, an
> array of tens of thousands of entries.  All told, about 1.3m of data.
>
> If I understand things correctly, it looks to me like, when our job is done,
> a copy of this array is retrieved from each individual task, all at once,
> for combination on the client - which means, with 400 tasks to the job, each
> collection is using up half a gig of memory on the client.
>
> Is this true?  If so, does anyone know a way to get accumulators to
> accumulate as results collect, rather than all at once at the end, so we
> only have to hold a few in memory at a time, rather than all 400?
>
> Thanks,
>               -Nathan
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  nkronenfeld@oculusinfo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message