Yarn only has the ability to kill not checkpoint or sig suspend. If you use too much memory it will simply kill tasks based upon the yarn config.
Hi Sven,What version of Spark are you running? Recent versions have a change that allows PySpark to share a pool of processes instead of starting a new one for each task.-SandyOn Fri, Jan 23, 2015 at 9:36 AM, Sven Krasser <firstname.lastname@example.org> wrote:Hey all,Any ideas what's happening here and how I can get the number of pyspark.daemon processes back to a more reasonable count?
I am running into a problem where YARN kills containers for being over their memory allocation (which is about 8G for executors plus 6G for overhead), and I noticed that in those containers there are tons of pyspark.daemon processes hogging memory. Here's a snippet from a container with 97 pyspark.daemon processes. The total sum of RSS usage across all of these is 1,764,956 pages (i.e. 6.7GB on the system).2015-01-23 15:36:53,654 INFO [Reporter] yarn.YarnAllocationHandler (Logging.scala:logInfo(59)) - Container marked as failed: container_1421692415636_0052_01_000030. Exit status: 143. Diagnostics: Container [pid=35211,containerID=container_1421692415636_0052_01_000030] is running beyond physical memory limits. Current usage: 14.9 GB of 14.5 GB physical memory used; 41.3 GB of 72.5 GB virtual memory used. Killing container.Dump of the process-tree for container_1421692415636_0052_01_000030 :|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE|- 54101 36625 36625 35211 (python) 78 1 332730368 16834 python -m pyspark.daemon|- 52140 36625 36625 35211 (python) 58 1 332730368 16837 python -m pyspark.daemon|- 36625 35228 36625 35211 (python) 65 604 331685888 17694 python -m pyspark.daemon[...]
Full output here: https://gist.github.com/skrasser/e3e2ee8dede5ef6b082cThank you!