spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <>
Subject Re: oome from blockmanager
Date Tue, 05 Nov 2013 16:18:12 GMT
As a followup on this, the memory footprint of all shuffle metadata has
been  greatly reduced. For your original workload with 7k mappers, 7k
reducers, and 5 machines, the total metadata size should have decreased
from ~3.3 GB to ~80 MB.

On Tue, Oct 29, 2013 at 9:07 AM, Aaron Davidson <> wrote:

> Great! Glad to hear it worked out. Spark definitely has a pain point about
> deciding the right number of partitions, and I think we're going to be
> spending a lot of time trying to reduce that issue.
> Currently working on the patch to reduce the shuffle file block overheads,
> but in the meantime, you can set "spark.shuffle.consolidateFiles=false" to
> exchange OOMEs due to too many partitions for worse performance (probably
> an acceptable tradeoff).
> On Mon, Oct 28, 2013 at 2:31 PM, Stephen Haberman <
>> wrote:
>> Hey guys,
>> As a follow up, I raised our target partition size to 600mb (up from
>> 64mb), which split this report's 500gb of tiny S3 files into ~700
>> partitions, and everything ran much smoother.
>> In retrospect, this was the same issue we'd ran into before, having too
>> many partitions, and had previously solved by throwing some guesses at
>> coalesce to make it magically go away.
>> But now I feel like we have a much better understanding of why the
>> numbers need to be what they are, which is great.
>> So, thanks for all the input and helping me understand what's going on.
>> It'd be great to see some of the optimizations to BlockManager happen,
>> but I understand in the end why it needs to track what it does. And I
>> was also admittedly using a small cluster anyway.
>> - Stephen

View raw message