Yarn only has the ability to kill not checkpoint or sig suspend.  If you use too much memory it will simply kill tasks based upon the yarn config.

On Friday, January 23, 2015, Sandy Ryza <sandy.ryza@cloudera.com> wrote:
Hi Sven,

What version of Spark are you running?  Recent versions have a change that allows PySpark to share a pool of processes instead of starting a new one for each task.


On Fri, Jan 23, 2015 at 9:36 AM, Sven Krasser <krasser@gmail.com> wrote:
Hey all,

I am running into a problem where YARN kills containers for being over their memory allocation (which is about 8G for executors plus 6G for overhead), and I noticed that in those containers there are tons of pyspark.daemon processes hogging memory. Here's a snippet from a container with 97 pyspark.daemon processes. The total sum of RSS usage across all of these is 1,764,956 pages (i.e. 6.7GB on the system).

Any ideas what's happening here and how I can get the number of pyspark.daemon processes back to a more reasonable count?

2015-01-23 15:36:53,654 INFO [Reporter] yarn.YarnAllocationHandler (Logging.scala:logInfo(59)) - Container marked as failed: container_1421692415636_0052_01_000030. Exit status: 143. Diagnostics: Container [pid=35211,containerID=container_1421692415636_0052_01_000030] is running beyond physical memory limits. Current usage: 14.9 GB of 14.5 GB physical memory used; 41.3 GB of 72.5 GB virtual memory used. Killing container.
Dump of the process-tree for container_1421692415636_0052_01_000030 :
|- 54101 36625 36625 35211 (python) 78 1 332730368 16834 python -m pyspark.daemon
|- 52140 36625 36625 35211 (python) 58 1 332730368 16837 python -m pyspark.daemon
|- 36625 35228 36625 35211 (python) 65 604 331685888 17694 python -m pyspark.daemon

Full output here: https://gist.github.com/skrasser/e3e2ee8dede5ef6b082c

Thank you!