spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Muttineni, Vinay" <vmuttin...@ebay.com>
Subject Losing executors due to memory problems
Date Thu, 11 Aug 2016 23:41:27 GMT
Hello,
I have a spark job that basically reads data from two tables into two Dataframes which are
subsequently converted to RDD's. I, then, join them based on a common key.
Each table is about 10 TB in size but after filtering, the two RDD's are about 500GB each.
I have 800 executors with 8GB memory per executor.
Everything works fine until the join stage. But, the join stage is throwing the below error.
I tried increasing the partitions before the join stage but it doesn't change anything.
Any ideas, how I can fix this and what I might be doing wrong?
Thanks,
Vinay

ExecutorLostFailure (executor 208 exited caused by one of the running tasks) Reason: Container
marked as failed: container_1469773002212_96618_01_000246 on host:. Exit status: 143. Diagnostics:
Container [pid=31872,containerID=container_1469773002212_96618_01_000246] is running beyond
physical memory limits. Current usage: 15.2 GB of 15.1 GB physical memory used; 15.9 GB of
31.8 GB virtual memory used. Killing container.
Dump of the process-tree for container_1469773002212_96618_01_000246 :
         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES)
RSSMEM_USAGE(PAGES) FULL_CMD_LINE
         |- 31883 31872 31872 31872 (java) 519517 41888 17040175104 3987193 /usr/java/latest/bin/java
-server -XX:OnOutOfMemoryError=kill %p -Xms14336m -Xmx14336m -Djava.io.tmpdir=/hadoop/11/scratch/local/usercacheappcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/tmp
-Dspark.driver.port=32988 -Dspark.ui.port=0 -Dspark.akka.frameSize=256 -Dspark.yarn.app.container.log.dir=/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246
-XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.12.7.4:32988
--executor-id 208 -hostname x.com --cores 11 --app-id application_1469773002212_96618 --user-class-path
file:/hadoop/11/scratch/local/usercache /appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/__app__.jar
--user-class-path file:/hadoop/11/scratch/local/usercache/ appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/mysql-connector-java-5.0.8-bin.jar
--user-class-path file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-core-3.2.10.jar
--user-class-path file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-api-jdo-3.2.6.jar
--user-class-path file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-rdbms-3.2.9.jar
         |- 31872 16580 31872 31872 (bash) 0 0 9146368 267 /bin/bash -c LD_LIBRARY_PATH=/apache/hadoop/lib/native:/apache/hadoop/lib/native/Linux-amd64-64:
/usr/java/latest/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms14336m -Xmx14336m -Djava.io.tmpdir=/hadoop/11/scratch/local/usercache/
appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/tmp '-Dspark.driver.port=32988'
'-Dspark.ui.port=0' '-Dspark.akka.frameSize=256' -Dspark.yarn.app.container.log.dir=/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246
-XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@1.4.1.6:32988
--executor-id 208 --hostname x.com --cores 11 --app-id application_1469773002212_96618 --user-class-path
file:/hadoop/11/scratch/local/usercache/ appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/__app__.jar
--user-class-path file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/mysql-connector-java-5.0.8-bin.jar
--user-class-path file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-core-3.2.10.jar
--user-class-path file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-api-jdo-3.2.6.jar
--user-class-path file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-rdbms-3.2.9.jar
1> /hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246/stdout
2> /hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143



Mime
View raw message