spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Sukmanowsky <>
Subject Python memory included YARN-monitored memory?
Date Fri, 27 May 2016 14:11:43 GMT
Hi everyone,

More of a YARN/OS question than a Spark one, but would be good to clarify
this on the docs somewhere once I get an answer.

We use PySpark for all our Spark applications running on EMR. Like many
users, we're accustomed to seeing the occasional ExecutorLostFailure after
YARN kills a container using more memory than it was allocated.

We're beginning to tune spark.yarn.executor.memoryOverhead, but before
messing around with that I wanted to check if YARN is monitoring the memory
usage of both the executor JVM and the spawned pyspark.daemon process or
just the JVM? Inspecting things on one of the YARN nodes would seem to
indicate this isn't the case since the spawned daemon gets a separate
process ID and process group, but I wanted to check to confirm as it could
make a big difference to pyspark users hoping to tune things.


View raw message