spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Allman <>
Subject is spark.cleaner.ttl safe?
Date Tue, 11 Mar 2014 20:58:36 GMT

I've been trying to run an iterative spark job that spills 1+ GB to disk 
per iteration on a system with limited disk space. I believe there's 
enough space if spark would clean up unused data from previous iterations, 
but as it stands the number of iterations I can run is limited by 
available disk space.

I found a thread on the usage of spark.cleaner.ttl on the old Spark Users 
Google group here:!topic/spark-users/9ebKcNCDih4

I think this setting may be what I'm looking for, however the cleaner 
seems to delete data that's still in use. The effect is I get bizarre 
exceptions from Spark complaining about missing broadcast data or 
ArrayIndexOutOfBounds. When is spark.cleaner.ttl safe to use? Is it 
supposed to delete in-use data or is this a bug/shortcoming?



View raw message