spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Landa <metalo...@gmail.com>
Subject Re: Long-Running Spark application doesn't clean old shuffle data correctly
Date Wed, 24 Jul 2019 05:34:29 GMT
Hi Keith,

I don't think that we keep such references.
But we do experience exceptions during the job execution that we catch and
retry (timeouts/network issues from different data sources).
Can they affect RDD cleanup?

Thanks,
Alex

On Sun, Jul 21, 2019 at 10:49 PM Keith Chapman <keithgchapman@gmail.com>
wrote:

> Hi Alex,
>
> Shuffle files in spark are deleted when the object holding a reference to
> the shuffle file on disk goes out of scope (is garbage collected by the
> JVM).  Could it be the case that you are keeping these objects alive?
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
>
> On Sun, Jul 21, 2019 at 12:19 AM Alex Landa <metaloink@gmail.com> wrote:
>
>> Thanks,
>> I looked into these options, the cleaner periodic interval is set to 30
>> min by default.
>> The block option for shuffle -
>> *spark.cleaner.referenceTracking.blocking.shuffle* - is set to false by
>> default.
>> What are the implications of setting it to true?
>> Will it make the driver slower?
>>
>> Thanks,
>> Alex
>>
>> On Sun, Jul 21, 2019 at 9:06 AM Prathmesh Ranaut Gmail <
>> prathmesh.ranaut@gmail.com> wrote:
>>
>>> This is the job of ContextCleaner. There are few a property that you can
>>> tweak to see if that helps:
>>> spark.cleaner.periodicGC.interval
>>> spark.cleaner.referenceTracking
>>> spark.cleaner.referenceTracking.blocking.shuffle
>>>
>>> Regards
>>> Prathmesh Ranaut
>>>
>>> On Jul 21, 2019, at 11:31 AM, Alex Landa <metaloink@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> We are running a long running Spark application ( which executes lots of
>>> quick jobs using our scheduler ) on Spark stand-alone cluster 2.4.0.
>>> We see that old shuffle files ( a week old for example ) are not deleted
>>> during the execution of the application, which leads to out of disk space
>>> errors on the executor.
>>> If we re-deploy the application, the Spark cluster take care of the
>>> cleaning
>>> and deletes the old shuffle data (since we have
>>> /-Dspark.worker.cleanup.enabled=true/ in the worker config).
>>> I don't want to re-deploy our app every week or two, but to be able to
>>> configure spark to clean old shuffle data (as it should).
>>>
>>> How can I configure Spark to delete old shuffle data during the life
>>> time of
>>> the application (not after)?
>>>
>>>
>>> Thanks,
>>> Alex
>>>
>>>

Mime
View raw message