spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-3000) Drop old blocks to disk in parallel when memory is not large enough for caching new blocks
Date Sat, 21 May 2016 20:07:12 GMT

     [ https://issues.apache.org/jira/browse/SPARK-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated SPARK-3000:
-----------------------------
    Target Version/s:   (was: 2.0.0)

> Drop old blocks to disk in parallel when memory is not large enough for caching new blocks
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3000
>                 URL: https://issues.apache.org/jira/browse/SPARK-3000
>             Project: Spark
>          Issue Type: Improvement
>          Components: Block Manager, Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Zhang, Liye
>            Assignee: Josh Rosen
>         Attachments: Spark-3000 Design Doc.pdf
>
>
> In spark, rdd can be cached in memory for later use, and the cached memory size is "*spark.executor.memory
* spark.storage.memoryFraction*" for spark version before 1.1.0, and "*spark.executor.memory
* spark.storage.memoryFraction * spark.storage.safetyFraction*" after [SPARK-1777|https://issues.apache.org/jira/browse/SPARK-1777].

> For Storage level *MEMORY_AND_DISK*, when free memory is not enough to cache new blocks,
old blocks might be dropped to disk to free up memory for new blocks. This operation is processed
by _ensureFreeSpace_ in _MemoryStore.scala_, there will always be a "*accountingLock*" held
by the caller to ensure only one thread is dropping blocks. This method can not fully used
the disks throughput when there are multiple disks on the working node. When testing our workload,
we found this is really a bottleneck when size of old blocks to be dropped is really large.

> We have tested the parallel method on spark 1.0, the speedup is significant. So it's
necessary to make dropping blocks operation in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message