Hi Andrew, here's what i found. Maybe would be relevant for people with the same issue:

1) There's 3 types of local resources in YARN (public, private, application). More about it here: http://hortonworks.com/blog/management-of-application-dependencies-in-yarn/

2) Spark cache is of application type of resource.

3) Currently it's not possible to specify quota for application resources (https://issues.apache.org/jira/browse/YARN-882)

4) The only it's possible to specify these 2 settings:
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage - The maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs.

yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb - The minimum space that must be available on a disk for it to be used. This applies to yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs.

5) Yarn's cache cleanup doesn't cleaned app resources: https://github.com/apache/hadoop/blob/8d58512d6e6d9fe93784a9de2af0056bcc316d96/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L511

As i understood application resources cleaned when spark application correctly terminates (using sc.stop()). But in my case when it fills all disk space it was stucked and couldn't stop correctly. After i restarted yarn i don't know how easily trigger cache cleanup except of manually on all the nodes.

Peter Rudenko

On 2015-07-10 20:07, Andrew Or wrote:
Hi Peter,

AFAIK Spark assumes infinite disk space, so there isn't really a way to limit how much space it uses. Unfortunately I'm not aware of a simpler workaround than to simply provision your cluster with more disk space. By the way, are you sure that it's disk space that exceeded the limit, but not the number of inodes? If it's the latter, maybe you could control the ulimit of the container.

To answer your other question: if it can't persist to disk then yes it will fail. It will only recompute from the data source if for some reason someone evicted our blocks from memory, but that shouldn't happen in your case since your'e using MEMORY_AND_DISK_SER.


2015-07-10 3:51 GMT-07:00 Peter Rudenko <petro.rudenko@gmail.com>:
Hi, i have a spark ML worklflow. It uses some persist calls. When i launch it with 1 tb dataset - it puts down all cluster, becauses it fills all disk space at /yarn/nm/usercache/root/appcache: http://i.imgur.com/qvRUrOp.png

I found a yarn settings:
yarn.nodemanager.localizer.cache.target-size-mb - Target size of localizer cache in MB, per nodemanager. It is a target retention size that only includes resources with PUBLIC and PRIVATE visibility and excludes resources with APPLICATION visibility

But it excludes resources with APPLICATION visibility, and spark cache as i understood is of APPLICATION type.

Is it possible to restrict a disk space for spark application? Will spark fail if it wouldn't be able to persist on disk (StorageLevel.MEMORY_AND_DISK_SER) or it would recompute from data source?

Peter Rudenko