spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: How to clear spark Shuffle files
Date Mon, 14 Sep 2020 21:25:03 GMT
There's a second new mechanism which uses TTL for cleanup of shuffle files.
Can you share more about your use case?

On Mon, Sep 14, 2020 at 1:33 PM Edward Mitchell <edeesis@gmail.com> wrote:

> We've also had some similar disk fill issues.
>
> For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM
> garbage collection. I've noticed that if RDDs maintain references in the
> code, and cannot be garbage collected, then immediate shuffle files hang
> around.
>
> Best way to handle this is by organizing your code such that when an RDD
> is finished, it falls out of scope, and thus is able to be garbage
> collected.
>
> There's also an experimental API created in Spark 3 (I think), that allows
> you to have more granular control by calling a method to clean up the
> shuffle files.
>
> On Mon, Sep 14, 2020 at 11:02 AM lsn248 <lekshmi.sony@gmail.com> wrote:
>
>> Hi,
>>
>>  I have a long running application and spark seem to fill up the disk with
>> shuffle files.  Eventually the job fails running out of disk space. Is
>> there
>> a way for me to clean the shuffle files ?
>>
>> Thanks
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Mime
View raw message