spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <>
Subject Re: Spark shuffle consolidateFiles performance degradation numbers
Date Tue, 04 Nov 2014 02:12:34 GMT
Hey Matt,

There's some prior work that compares consolidation performance on some
medium-scale workload:

There we noticed about 2x performance degradation in the reduce phase on
ext3. I am not aware of any other concrete numbers. Maybe others have more
experiences to add.


2014-11-03 17:26 GMT-08:00 Matt Cheah <>:

> Hi everyone,
> I'm running into more and more cases where too many files are opened when
> spark.shuffle.consolidateFiles is turned off.
> I was wondering if this is a common scenario among the rest of the
> community, and if so, if it is worth considering the setting to be turned
> on by default. From the documentation, it seems like the performance could
> be hurt on ext3 file systems. However, what are the concrete numbers of
> performance degradation that is seen typically? A 2x slowdown in the
> average job? 3x? Also, what cause the performance degradation on ext3 file
> systems specifically?
> Thanks,
> -Matt Cheah

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message