spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han JU <ju.han.fe...@gmail.com>
Subject Re: Shuffle file consolidation
Date Fri, 23 May 2014 14:13:08 GMT
Hi Nathan,

There's some explanation in the spark configuration section:

```
If set to "true", consolidates intermediate files created during a shuffle.
Creating fewer files can improve filesystem performance for shuffles with
large numbers of reduce tasks. It is recommended to set this to "true" when
using ext4 or xfs filesystems. On ext3, this option might degrade
performance on machines with many (>8) cores due to filesystem limitations.
```


2014-05-23 16:00 GMT+02:00 Nathan Kronenfeld <nkronenfeld@oculusinfo.com>:

> In trying to sort some largish datasets, we came across the
> spark.shuffle.consolidateFiles property, and I found in the source code
> that it is set, by default, to false, with a note to default it to true
> when the feature is stable.
>
> Does anyone know what is unstable about this? If we set it true, what
> problems should we anticipate?
>
> Thanks,
>             -Nathan Kronenfeld
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  nkronenfeld@oculusinfo.com
>



-- 
*JU Han*

Data Engineer @ Botify.com

+33 0619608888

Mime
View raw message