spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: spark-local dir running out of space during long ALS run
Date Mon, 16 Feb 2015 20:04:53 GMT
Correct, brute force clean up is not useful. Since Spark 1.0, Spark can do
automatic cleanup of files based on which RDDs are used/garbage collected
by JVM. That would be the best way, but depends on the JVM GC
characteristics. If you force a GC periodically in the driver that might
help you get rid of files in the workers that are not needed.

TD

On Mon, Feb 16, 2015 at 12:27 AM, Antony Mayi <antonymayi@yahoo.com.invalid>
wrote:

> spark.cleaner.ttl is not the right way - seems to be really designed for
> streaming. although it keeps the disk usage under control it also causes
> loss of rdds and broadcasts that are required later leading to crash.
>
> is there any other way?
> thanks,
> Antony.
>
>
>   On Sunday, 15 February 2015, 21:42, Antony Mayi <antonymayi@yahoo.com>
> wrote:
>
>
>
> spark.cleaner.ttl ?
>
>
>   On Sunday, 15 February 2015, 18:23, Antony Mayi <antonymayi@yahoo.com>
> wrote:
>
>
>
> Hi,
>
> I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using
> about 3 billions of ratings and I am doing several trainImplicit() runs in
> loop within one spark session. I have four node cluster with 3TB disk space
> on each. before starting the job there is less then 8% of the disk space
> used. while the ALS is running I can see the disk usage rapidly growing
> mainly because of files being stored
> under yarn/local/usercache/user/appcache/application_XXX_YYY/spark-local-ZZZ-AAA.
> after about 10 hours the disk usage hits 90% and yarn kills the particular
> containers.
>
> am I missing doing some cleanup somewhere while looping over the several
> trainImplicit() calls? taking 4*3TB of disk space seems immense.
>
> thanks for any help,
> Antony.
>
>
>
>
>
>

Mime
View raw message