spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Any setting or configuration that I can use in spark that would dump more info on job errors
Date Thu, 14 Nov 2013 04:39:48 GMT
Hi Hussam,

Have you looked at the stdout and stderr files from the worker process? You can find them
in the “work” directory under SPARK_HOME on the slave node. They might have some information
about why it crashed. Otherwise, I’d recommend profiling the workers with tools like jmap
or jstack to see what objects take up memory. Commonly the problem may be having too low a
level of parallelism set.

Matei

On Nov 12, 2013, at 8:53 AM, Hussam_Jarada@Dell.com wrote:

> Hi,
> 
> Using spark 0.8 and hadoop 1.2.1 with cluster of 2 node each have 16 CPU and allocated
8G of RAM
> 
> I am running into a use case that if I try to save a very large JavaRDD<String>
that was created using paralleize from Java List<String> my job workers are failing
as follows
> 
> 13/11/11 19:23:48 INFO Worker: Executor app-20131111191414-0001/2 finished with state
FAILED message Command exited with code 1 exitStatus 1
> 
> Looks like the spark driver trying 5 times to execute the  then decide to kill the process
> 
> Any help on how to get more info on the reason of failure or what code 1 existStatus
1 would means here?
> 
> Any setting or configuration that I can use in spark that would dump more info on error?
> 
> Here's my logs
> 
> 13/11/11 19:14:50 INFO Worker: Asked to launch executor app-20131111190659-0000/0 for
OMDBQueryService
> 13/11/11 19:14:50 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler"
"0" "poc3" "16"
> 13/11/11 19:16:47 INFO Worker: Executor app-20131111190659-0000/0 finished with state
FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:16:47 INFO Worker: Asked to launch executor app-20131111190659-0000/2 for
OMDBQueryService
> 13/11/11 19:16:47 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler"
"2" "poc3" "16"
> 13/11/11 19:16:53 INFO Worker: Executor app-20131111190659-0000/2 finished with state
FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:16:53 INFO Worker: Asked to launch executor app-20131111190659-0000/4 for
OMDBQueryService
> 13/11/11 19:16:53 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler"
"4" "poc3" "16"
> 13/11/11 19:17:02 INFO Worker: Executor app-20131111190659-0000/4 finished with state
FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:17:02 INFO Worker: Asked to launch executor app-20131111190659-0000/6 for
OMDBQueryService
> 13/11/11 19:17:02 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler"
"6" "poc3" "16"
> 13/11/11 19:17:09 INFO Worker: Executor app-20131111190659-0000/6 finished with state
FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:17:09 INFO Worker: Asked to launch executor app-20131111190659-0000/8 for
OMDBQueryService
> 13/11/11 19:17:09 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler"
"8" "poc3" "16"
> 13/11/11 19:17:17 INFO Worker: Executor app-20131111190659-0000/8 finished with state
FAILED message Command exited with code 1 exitStatus 1
> 13/11/11 19:17:17 INFO Worker: Asked to launch executor app-20131111190659-0000/10 for
OMDBQueryService
> 13/11/11 19:17:17 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC"
"-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark"
"-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m"
"-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler"
"10" "poc3" "16"
> 13/11/11 19:17:20 INFO Worker: Asked to kill executor app-20131111190659-0000/10
> 13/11/11 19:17:20 INFO ExecutorRunner: Killing process!
> 13/11/11 19:17:20 INFO ExecutorRunner: Runner thread for executor app-20131111190659-0000/10
interrupted
> 13/11/11 19:17:21 INFO Worker: Executor app-20131111190659-0000/10 finished with state
KILLED
> 
> Thanks,
> Hussam


Mime
View raw message