spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: Worker Machine running out of disk for Long running Streaming process
Date Fri, 21 Aug 2015 20:44:23 GMT
Could you periodically (say every 10 mins) run System.gc() on the driver.
The cleaning up shuffles is tied to the garbage collection.


On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma <sharmagaurav32@gmail.com>
wrote:

> Hi All,
>
>
> I have a 24x7 running Streaming Process, which runs on 2 hour windowed data
>
> The issue i am facing is my worker machines are running OUT OF DISK space
>
> I checked that the SHUFFLE FILES are not getting cleaned up.
>
>
> /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data
>
> Ultimately the machines runs out of Disk Spac
>
>
> i read about *spark.cleaner.ttl *config param which what i can understand
> from the documentation, says cleans up all the metadata beyond the time
> limit.
>
> I went through https://issues.apache.org/jira/browse/SPARK-5836
> it says resolved, but there is no code commit
>
> Can anyone please throw some light on the issue.
>
>
>

Mime
View raw message