spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Updated] (SPARK-15263) Make shuffle service dir cleanup faster by using `rm -rf`
Date Wed, 18 May 2016 11:12:12 GMT


Sean Owen updated SPARK-15263:
    Assignee: Tejas Patil

> Make shuffle service dir cleanup faster by using `rm -rf`
> ---------------------------------------------------------
>                 Key: SPARK-15263
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Tejas Patil
>            Assignee: Tejas Patil
>            Priority: Minor
>             Fix For: 2.1.0
> The current logic for directory cleanup (JavaUtils. deleteRecursively) is slow because
it does directory listing, recurses over child directories, checks for symbolic links, deletes
leaf files and finally deletes the dirs when they are empty. There is back-and-forth switching
from kernel space to user space while doing this. Since most of the deployment backends would
be Unix systems, we could essentially just do rm -rf so that entire deletion logic runs in
kernel space.
> The current Java based impl in Spark seems to be similar to what standard libraries like
guava and commons IO do (eg.
However, guava removed this method in favour of shelling out to an operating system command
(which is exactly what I am proposing). See the Deprecated note in older javadocs for guava
for details :
> Ideally, Java should be providing such APIs so that users won't have to do such things
to get platform specific code. Also, its not just about speed, but also handling race conditions
while doing at FS deletions is tricky. I could find this bug for Java in similar context :

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message