spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <>
Subject [jira] [Created] (SPARK-12757) Use reference counting to prevent blocks from being evicted during reads
Date Mon, 11 Jan 2016 20:31:39 GMT
Josh Rosen created SPARK-12757:

             Summary: Use reference counting to prevent blocks from being evicted during reads
                 Key: SPARK-12757
             Project: Spark
          Issue Type: Improvement
          Components: Block Manager
            Reporter: Josh Rosen
            Assignee: Josh Rosen

As a pre-requisite to off-heap caching of blocks, we need a mechanism to prevent pages / blocks
from being evicted while they are being read. With on-heap objects, evicting a block while
it is being read merely leads to memory-accounting problems (because we assume that an evicted
block is a candidate for garbage-collection, which will not be true during a read), but with
off-heap memory this will lead to either data corruption or segmentation faults.

To address this, we should add a reference-counting mechanism to track which blocks/pages
are being read in order to prevent them from being evicted prematurely. I propose to do this
in two phases: first, add a safe, conservative approach in which all BlockManager.get*() calls
implicitly increment the reference count of blocks and where tasks' references are automatically
freed upon task completion. This will be correct but may have adverse performance impacts
because it will prevent legitimate block evictions. In phase two, we should incrementally
add release() calls in order to fix the eviction of unreferenced blocks. The latter change
may need to touch many different components, which is why I propose to do it separately in
order to make the changes easier to reason about and review.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message