spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <>
Subject [jira] [Updated] (SPARK-741) DiskStore should use > 8kB buffer when doing writes
Date Sun, 30 Mar 2014 04:15:27 GMT


Patrick Wendell updated SPARK-741:

    Reporter: Patrick Wendell  (was: Patrick Cogan)

> DiskStore should use > 8kB buffer when doing writes
> ---------------------------------------------------
>                 Key: SPARK-741
>                 URL:
>             Project: Apache Spark
>          Issue Type: Improvement
>            Reporter: Patrick Wendell
>            Assignee: Reynold Xin
>             Fix For: 0.8.0
> Right now the DiskStore uses a buffered output stream with the default buffer size of
8kB. This can hurt disk throughput by a substantial amount when there are several shuffle
files being output at once (either due to a large # of concurrent tasks or a large # of output
> We should avoid increasing this buffer arbitrarily because it is instantiated (# tasks
* # splits) times currently, which could be large. The best approach is probably to do something
like this:
> - By default, give each task 10mB of total buffer space, divided up amongst its output
> - If this means each split buffer is < 8kB, bump up to at least 8kB (we'd rather OOM
then have terrible disk throughput, so at least people can figure out what's wrong).

This message was sent by Atlassian JIRA

View raw message