spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Removing broadcasts
Date Thu, 05 Dec 2013 00:50:31 GMT
Hey Roman,

It looks like that pull request was never migrated to the Apache GitHub, but I like the idea.
If you migrate it over, we can merge in something like this. In terms of the API, I’d just
add a unpersist() method on each Broadcast object.

Matei

On Dec 3, 2013, at 6:00 AM, Roman Pastukhov <metaignatich@gmail.com> wrote:

> Hi,
> 
> In iterative processes that use broadcasts they seem to cause memory usage problems as
they are left it memory. Unfortunately only way to remove them now requires reflection hacks.
> 
> TTL based cleaning would also remove JobConf broadcasts, moreover it requires each iteration
to perform within some predefined time frame, so it does not seem like a good option.
> 
> So I was wondering what happened to https://github.com/mesos/spark/pull/771 and whether
it makes sense to submit similar pull requests?
> 
> PS.TTL cleanup also removes broadcast files on disk, does this mean that if some RDD
part that used some old broadcast needs to be recalculated because of lost executor this will
fail?


Mime
View raw message