Hi All,

  When a Spark job (Spark-1.5.2) is submitted with a single executor and if user passes some wrong JVM arguments with spark.executor.extraJavaOptions, the first executor fails. But the job keeps on retrying, creating a new executor and failing every time, until CTRL-C is pressed. Do we have configuration to limit the retry attempts.

Example:

./spark-submit --class SimpleApp --master "spark://10.10.72.145:7077"  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=16" /SPARK/SimpleApp.jar

Executor fails with

Error occurred during initialization of VM
Can't have more ConcGCThreads than ParallelGCThreads.

But the job does not exit, keeps on creating executors and retrying.
..........
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160201065319-0014/2846 on hostPort 10.10.72.145:36558 with 12 cores, 2.0 GB RAM
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2846 is now LOADING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2846 is now RUNNING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2846 is now EXITED (Command exited with code 1)
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Executor app-20160201065319-0014/2846 removed: Command exited with code 1
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 2846
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor added: app-20160201065319-0014/2847 on worker-20160131230345-10.10.72.145-36558 (10.10.72.145:36558) with 12 cores
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160201065319-0014/2847 on hostPort 10.10.72.145:36558 with 12 cores, 2.0 GB RAM
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2847 is now LOADING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2847 is now EXITED (Command exited with code 1)
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Executor app-20160201065319-0014/2847 removed: Command exited with code 1
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 2847
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor added: app-20160201065319-0014/2848 on worker-20160131230345-10.10.72.145-36558 (10.10.72.145:36558) with 12 cores
16/02/01 06:54:28 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160201065319-0014/2848 on hostPort 10.10.72.145:36558 with 12 cores, 2.0 GB RAM
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2848 is now LOADING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated: app-20160201065319-0014/2848 is now RUNNING
............



Thanks,
Prabhu Joseph