spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <and...@databricks.com>
Subject Re: How to restrict disk space for spark caches on yarn?
Date Fri, 10 Jul 2015 17:07:48 GMT
Hi Peter,

AFAIK Spark assumes infinite disk space, so there isn't really a way to
limit how much space it uses. Unfortunately I'm not aware of a simpler
workaround than to simply provision your cluster with more disk space. By
the way, are you sure that it's disk space that exceeded the limit, but not
the number of inodes? If it's the latter, maybe you could control the
ulimit of the container.

To answer your other question: if it can't persist to disk then yes it will
fail. It will only recompute from the data source if for some reason
someone evicted our blocks from memory, but that shouldn't happen in your
case since your'e using MEMORY_AND_DISK_SER.

-Andrew


2015-07-10 3:51 GMT-07:00 Peter Rudenko <petro.rudenko@gmail.com>:

>  Hi, i have a spark ML worklflow. It uses some persist calls. When i
> launch it with 1 tb dataset - it puts down all cluster, becauses it fills
> all disk space at /yarn/nm/usercache/root/appcache:
> http://i.imgur.com/qvRUrOp.png
>
> I found a yarn settings:
> *yarn*.nodemanager.localizer.*cache*.target-size-mb - Target size of
> localizer cache in MB, per nodemanager. It is a target retention size that
> only includes resources with PUBLIC and PRIVATE visibility and excludes
> resources with APPLICATION visibility
>
> But it excludes resources with APPLICATION visibility, and spark cache as
> i understood is of APPLICATION type.
>
> Is it possible to restrict a disk space for spark application? Will spark
> fail if it wouldn't be able to persist on disk
> (StorageLevel.MEMORY_AND_DISK_SER) or it would recompute from data source?
>
> Thanks,
> Peter Rudenko
>
>
>
>
>

Mime
View raw message