spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Rudenko <petro.rude...@gmail.com>
Subject How to restrict disk space for spark caches on yarn?
Date Fri, 10 Jul 2015 10:51:39 GMT
Hi, i have a spark ML worklflow. It uses some persist calls. When i 
launch it with 1 tb dataset - it puts down all cluster, becauses it 
fills all disk space at /yarn/nm/usercache/root/appcache: 
http://i.imgur.com/qvRUrOp.png

I found a yarn settings:
/yarn/.nodemanager.localizer./cache/.target-size-mb - Target size of 
localizer cache in MB, per nodemanager. It is a target retention size 
that only includes resources with PUBLIC and PRIVATE visibility and 
excludes resources with APPLICATION visibility

But it excludes resources with APPLICATION visibility, and spark cache 
as i understood is of APPLICATION type.

Is it possible to restrict a disk space for spark application? Will 
spark fail if it wouldn't be able to persist on disk 
(StorageLevel.MEMORY_AND_DISK_SER) or it would recompute from data source?

Thanks,
Peter Rudenko





Mime
View raw message