spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pat Ferrel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-4796) Spark does not remove temp files
Date Wed, 12 Aug 2015 01:00:49 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692692#comment-14692692
] 

Pat Ferrel commented on SPARK-4796:
-----------------------------------

Why is this marked resolved? Spark does indeed leave around a lot of files and unless you
are looking you'd never know. It sounds like the only safe method to remove these is to shutdown
Spark and delete them.

I skimmed the issue so sorry if I missed something. 15G on the MBP and counting :-)


> Spark does not remove temp files
> --------------------------------
>
>                 Key: SPARK-4796
>                 URL: https://issues.apache.org/jira/browse/SPARK-4796
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 1.1.0
>         Environment: I'm runnin spark on mesos and mesos slaves are docker containers.
Spark 1.1.0, elasticsearch spark 2.1.0-Beta3, mesos 0.20.0, docker 1.2.0.
>            Reporter: Ian Babrou
>
> I started a job that cannot fill into memory and got "no space left on device". That
was fair, because docker containers only have 10gb of disk space and some is taken by OS already.
> But then I found out when job failed it didn't release any disk space and left container
without any free disk space.
> Then I decided to check if spark removes temp files in any case, because many mesos slaves
had /tmp/spark-local-*. Apparently some garbage stays after spark task is finished. I attached
with strace to running job:
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/12/temp_8a73fcc2-4baa-499a-8add-0161f918de8a")
= 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/31/temp_47efd04b-d427-4139-8f48-3d5d421e9be4")
= 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/15/temp_619a46dc-40de-43f1-a844-4db146a607c6")
= 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/05/temp_d97d90a7-8bc1-4742-ba9b-41d74ea73c36"
<unfinished ...>
> [pid 30212] <... unlink resumed> )      = 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/36/temp_a2deb806-714a-457a-90c8-5d9f3247a5d7")
= 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/04/temp_afd558f1-2fd0-48d7-bc65-07b5f4455b22")
= 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/32/temp_a7add910-8dc3-482c-baf5-09d5a187c62a"
<unfinished ...>
> [pid 30212] <... unlink resumed> )      = 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/21/temp_485612f0-527f-47b0-bb8b-6016f3b9ec19")
= 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/12/temp_bb2b4e06-a9dd-408e-8395-f6c5f4e2d52f")
= 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/1e/temp_825293c6-9d3b-4451-9cb8-91e2abe5a19d"
<unfinished ...>
> [pid 30212] <... unlink resumed> )      = 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/15/temp_43fbb94c-9163-4aa7-ab83-e7693b9f21fc")
= 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/3d/temp_37f3629c-1b09-4907-b599-61b7df94b898"
<unfinished ...>
> [pid 30212] <... unlink resumed> )      = 0
> [pid 30212] unlink("/tmp/spark-local-20141209091330-48b5/35/temp_d18f49f6-1fb1-4c01-a694-0ee0a72294c0")
= 0
> And after job is finished, some files are still there:
> /tmp/spark-local-20141209091330-48b5/
> /tmp/spark-local-20141209091330-48b5/11
> /tmp/spark-local-20141209091330-48b5/11/shuffle_0_1_4
> /tmp/spark-local-20141209091330-48b5/32
> /tmp/spark-local-20141209091330-48b5/04
> /tmp/spark-local-20141209091330-48b5/05
> /tmp/spark-local-20141209091330-48b5/0f
> /tmp/spark-local-20141209091330-48b5/0f/shuffle_0_1_2
> /tmp/spark-local-20141209091330-48b5/3d
> /tmp/spark-local-20141209091330-48b5/0e
> /tmp/spark-local-20141209091330-48b5/0e/shuffle_0_1_1
> /tmp/spark-local-20141209091330-48b5/15
> /tmp/spark-local-20141209091330-48b5/0d
> /tmp/spark-local-20141209091330-48b5/0d/shuffle_0_1_0
> /tmp/spark-local-20141209091330-48b5/36
> /tmp/spark-local-20141209091330-48b5/31
> /tmp/spark-local-20141209091330-48b5/12
> /tmp/spark-local-20141209091330-48b5/21
> /tmp/spark-local-20141209091330-48b5/10
> /tmp/spark-local-20141209091330-48b5/10/shuffle_0_1_3
> /tmp/spark-local-20141209091330-48b5/1e
> /tmp/spark-local-20141209091330-48b5/35
> If I look into my mesos slaves, there are mostly "shuffle" files, overall picture for
single node:
> root@web338:~# find /tmp/spark-local-20141* -type f | fgrep shuffle | wc -l
> 781
> root@web338:~# find /tmp/spark-local-20141* -type f | fgrep -v shuffle | wc -l
> 10
> root@web338:~# find /tmp/spark-local-20141* -type f | fgrep -v shuffle
> /tmp/spark-local-20141119144512-67c4/2d/temp_9056f380-3edb-48d6-a7df-d4896f1e1cc3
> /tmp/spark-local-20141119144512-67c4/3d/temp_e005659b-eddf-4a34-947f-4f63fcddf111
> /tmp/spark-local-20141119144512-67c4/16/temp_71eba702-36b4-4e1a-aebc-20d2080f1705
> /tmp/spark-local-20141119144512-67c4/0d/temp_8037b9db-2d8a-4786-a554-a8cad922bf5e
> /tmp/spark-local-20141119144512-67c4/24/temp_f0e4cc43-6cc9-42a7-882d-f8a031fa4dc3
> /tmp/spark-local-20141119144512-67c4/29/temp_a8bbe2cb-f590-4b71-8ef8-9c0324beddc7
> /tmp/spark-local-20141119144512-67c4/3a/temp_9fc08519-f23a-40ac-a3fd-e58df6871460
> /tmp/spark-local-20141119144512-67c4/1e/temp_d66668ab-2999-48af-a136-84cfd6f5f6cb
> /tmp/spark-local-20141205110922-f78e/0a/temp_7409add5-e6ff-46e5-ae3f-6a4c7b2ddf8f
> /tmp/spark-local-20141205111026-0b53/01/temp_72024c94-7512-4692-8bd1-ef2417143d8c
> Conclusions:
> 1. Shuffle files should be removed, but they stay. 
> 2. Temp files should always be removed, but they stay.
> Maybe we should unlink temp and shuffle files immediately after creation to remove them
even if spark fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message