spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Albert <>
Subject How does unmanaged memory work with the executor memory limits?
Date Mon, 12 Jan 2015 22:06:39 GMT
My executors apparently are being terminated because they are "running beyond physical memory
limits" according to the "yarn-hadoop-nodemanager" logs on the worker nodes (/mnt/var/log/hadoop
on AWS EMR).  I'm setting the "driver-memory" to 8G.However, looking at "stdout" in userlogs,
I can see GC going on, but the lines looklike "6G -> 5G(7.2G), 0.45secs", so the GC seems
to think that the process is usingabout 6G of space, not 8G of space.  However, "ps aux"
shows an RSS hovering just below 8G.
The process does a "mapParitionsWithIndex", and the process uses compressionwhich (I believe)
calls into the native zlib library (the overall purpose is to convert each partition into
a "matlab" file).
Could it be that the Yarn container is counting both the memory used by the JVM proper and
memory used by zlib, but that the GC only "sees" the "internal" memory.  So the GC keeps
the memory usage "reasonable", e.g., 6G in an 8G container, but then zlib grabs some memory,
and the YARN container then terminates the task?
If so, is there anything I can do so that I tell YARN to watch for a largermemory limit than
I tell the JVM to use for it's memory?
Sincerely, Mike
View raw message