spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-741) DiskStore should use > 8kB buffer when doing writes
Date Sun, 30 Mar 2014 04:15:27 GMT

     [ https://issues.apache.org/jira/browse/SPARK-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Patrick Wendell updated SPARK-741:
----------------------------------

    Reporter: Patrick Wendell  (was: Patrick Cogan)

> DiskStore should use > 8kB buffer when doing writes
> ---------------------------------------------------
>
>                 Key: SPARK-741
>                 URL: https://issues.apache.org/jira/browse/SPARK-741
>             Project: Apache Spark
>          Issue Type: Improvement
>            Reporter: Patrick Wendell
>            Assignee: Reynold Xin
>             Fix For: 0.8.0
>
>
> Right now the DiskStore uses a buffered output stream with the default buffer size of
8kB. This can hurt disk throughput by a substantial amount when there are several shuffle
files being output at once (either due to a large # of concurrent tasks or a large # of output
splits).
> We should avoid increasing this buffer arbitrarily because it is instantiated (# tasks
* # splits) times currently, which could be large. The best approach is probably to do something
like this:
> - By default, give each task 10mB of total buffer space, divided up amongst its output
partitions.
> - If this means each split buffer is < 8kB, bump up to at least 8kB (we'd rather OOM
then have terrible disk throughput, so at least people can figure out what's wrong).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message