spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Mayi <>
Subject Re: spark-local dir running out of space during long ALS run
Date Sun, 15 Feb 2015 20:42:58 GMT
spark.cleaner.ttl ? 

     On Sunday, 15 February 2015, 18:23, Antony Mayi <> wrote:

I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about 3 billions
of ratings and I am doing several trainImplicit() runs in loop within one spark session. I
have four node cluster with 3TB disk space on each. before starting the job there is less
then 8% of the disk space used. while the ALS is running I can see the disk usage rapidly
growing mainly because of files being stored under yarn/local/usercache/user/appcache/application_XXX_YYY/spark-local-ZZZ-AAA.
after about 10 hours the disk usage hits 90% and yarn kills the particular containers.
am I missing doing some cleanup somewhere while looping over the several trainImplicit() calls?
taking 4*3TB of disk space seems immense.
thanks for any help,Antony. 

View raw message