This is the job of ContextCleaner. There are few a property that you can tweak to see if that helps: 
spark.cleaner.periodicGC.interval
spark.cleaner.referenceTracking
spark.cleaner.referenceTracking.blocking.shuffle

Regards
Prathmesh Ranaut
On Jul 21, 2019, at 11:36 AM, Prathmesh Ranaut Gmail <prathmesh.ranaut@gmail.com> wrote:

This is the job of ContextCleaner. There are few a property that you can tweak to see if that helps: 
spark.cleaner.periodicGC.interval
spark.cleaner.referenceTracking
spark.cleaner.referenceTracking.blocking.shuffle

Regards
Prathmesh Ranaut
On Jul 21, 2019, at 11:31 AM, Alex Landa <metaloink@gmail.com> wrote:

Hi,

We are running a long running Spark application ( which executes lots of
quick jobs using our scheduler ) on Spark stand-alone cluster 2.4.0.
We see that old shuffle files ( a week old for example ) are not deleted
during the execution of the application, which leads to out of disk space
errors on the executor. 
If we re-deploy the application, the Spark cluster take care of the cleaning
and deletes the old shuffle data (since we have
/-Dspark.worker.cleanup.enabled=true/ in the worker config).
I don't want to re-deploy our app every week or two, but to be able to
configure spark to clean old shuffle data (as it should). 

How can I configure Spark to delete old shuffle data during the life time of
the application (not after)? 


Thanks,
Alex