Hi Keith,

I don't think that we keep such references. 
But we do experience exceptions during the job execution that we catch and retry (timeouts/network issues from different data sources).
Can they affect RDD cleanup?


Hi Alex,

Shuffle files in spark are deleted when the object holding a reference to the shuffle file on disk goes out of scope (is garbage collected by the JVM).  Could it be the case that you are keeping these objects alive?

I looked into these options, the cleaner periodic interval is set to 30 min by default. 
The block option for shuffle - spark.cleaner.referenceTracking.blocking.shuffle - is set to false by default.
What are the implications of setting it to true? 
Will it make the driver slower? 


This is the job of ContextCleaner. There are few a property that you can tweak to see if that helps: 

Prathmesh Ranaut
We are running a long running Spark application ( which executes lots of
quick jobs using our scheduler ) on Spark stand-alone cluster 2.4.0.
We see that old shuffle files ( a week old for example ) are not deleted
during the execution of the application, which leads to out of disk space
errors on the executor. 
If we re-deploy the application, the Spark cluster take care of the cleaning
and deletes the old shuffle data (since we have
/-Dspark.worker.cleanup.enabled=true/ in the worker config).
I don't want to re-deploy our app every week or two, but to be able to
configure spark to clean old shuffle data (as it should). 

How can I configure Spark to delete old shuffle data during the life time of
the application (not after)?