spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "holdenk (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-22083) When dropping multiple blocks to disk, Spark should release all locks on a failure
Date Tue, 03 Oct 2017 06:11:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-22083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

holdenk updated SPARK-22083:
----------------------------
    Fix Version/s: 2.1.2

> When dropping multiple blocks to disk, Spark should release all locks on a failure
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-22083
>                 URL: https://issues.apache.org/jira/browse/SPARK-22083
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: Imran Rashid
>            Assignee: Imran Rashid
>             Fix For: 2.1.2, 2.2.1, 2.3.0, 2.1.3
>
>
> {{MemoryStore.evictBlocksToFreeSpace}} first [acquires writer locks on all the blocks
it intends to evict | https://github.com/apache/spark/blob/55d5fa79db883e4d93a9c102a94713c9d2d1fb55/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L520].
 However, if there is an exception while dropping blocks, there is no {{finally}} block to
release all the locks.
> If there is only one block being dropped, this isn't a problem (probably).  Usually the
call stack goes from {{MemoryStore.evictBlocksToFreeSpace --> dropBlocks --> BlockManager.dropFromMemory
--> DiskStore.put}}.  And {{DiskStore.put}} does do a [{{removeBlock()}} in a {{finally}}
block|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/DiskStore.scala#L83],
which cleans up the locks.
> I ran into this from the serialization issue in SPARK-21928.  In that, a netty thread
ends up trying to evict some blocks from memory to disk, and fails.  When there is only one
block that needs to be evicted, and the error occurs, there isn't any real problem; I assume
that netty thread is dead, but the executor threads seem fine.  However, in the cases where
two blocks get dropped, one task gets completely stuck.  Unfortunately I don't have a stack
trace from the stuck executor, but I assume it just waits forever on this lock that never
gets released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message