spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Many executors with the same ID in web UI (under Executors)?
Date Sat, 18 Jun 2016 17:12:07 GMT
BTW you can see the cause of these failure

Container exited with a non-zero exit code 52
16/06/18 17:51:02 ERROR TaskSetManager: Task 42 in stage 43.0 failed 4
times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 42
in stage 43.0 failed 4 times, most recent failure: Lost task 42.3 in stage
43.0 (TID 2828, rhes564): ExecutorLostFailure (executor 9 exited caused by
one of the running tasks) Reason: Container marked as failed:
container_1465627515776_0032_01_000010 on host: rhes564. Exit status: 52.
Diagnostics: Exception from container-launch.
Container id: container_1465627515776_0032_01_000010

And the same from yarn node manager log

2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from
container-launch.
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id:
container_1465627515776_0032_01_000010
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 52
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace:
ExitCodeException exitCode=52:
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell.run(Shell.java:455)
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.lang.Thread.run(Thread.java:745)
2016-06-18 17:51:01,249 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Container exited with a non-zero exit code 52
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1465627515776_0032_01_000010 transitioned from RUNNING
to EXITED_WITH_FAILURE
2016-06-18 17:51:01,249 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1465627515776_0032_01_000010
2016-06-18 17:51:01,260 WARN
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser
OPERATION=Container Finished - Failed   TARGET=ContainerImpl
RESULT=FAILURE  DESCRIPTION=Container failed with state:
EXITED_WITH_FAILURE    APPID=application_1465627515776_0032
CONTAINERID=container_1465627515776_0032_01_000010

So I am not sure what other information is needed

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 18 June 2016 at 17:53, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:

> mine is 1.6.1 running in YARN with one worker all on 1-node cluster
>
> This is my own version spark-shell
>
>     export SPARK_SUBMIT_OPTS
>     ${SPARK_HOME}/bin/spark-submit \
>                 --driver-memory=8G \
>                 --num-executors=8 \
>                 --executor-memory=1G \
>                 --master yarn \
>                 --deploy-mode client \
>                 --executor-cores=8 \
>                 --conf "spark.scheduler.mode=FAIR" \
>                 --conf "spark.ui.port=55555" \
>                 --conf "spark.driver.port=54631" \
>                 --conf "spark.fileserver.port=54731" \
>                 --conf "spark.blockManager.port=54832" \
>                 --conf "spark.kryoserializer.buffer.max=512" \
>                 --conf
> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps" \
>                 --class org.apache.spark.repl.Main \
>                 --name "my own Spark shell on YARN" "$@"
>
>
> With driver memory = 8G with num-executors=8 and executor-memory=1
>
> But in my case there are two executors. The driver itself plus incremental
> executor ID like below
>
> [image: Inline images 1]
> However, That is consistent with resources I have. After all YARN decides
> how to allocate resources not submitters?
>
> In my Jobs page I have Completed and Failed Jobs
>
> [image: Inline images 2]
>
>
> So I am not sure in your case what is happening
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 18 June 2016 at 17:35, Jacek Laskowski <jacek@japila.pl> wrote:
>
>> Hi Mich,
>>
>> That's correct -- they're indeed duplicates in the table but not on
>> OS. The reason for this *might* be that you need to have separate
>> stdout and stderr for the failed execution(s). I'm using
>> --num-executors 2 and there are two executor backends.
>>
>> $ jps -l
>> 28865 sun.tools.jps.Jps
>> 802 com.typesafe.zinc.Nailgun
>> 28276 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
>> 28804 org.apache.spark.executor.CoarseGrainedExecutorBackend
>> 15450
>> 28378 org.apache.hadoop.yarn.server.nodemanager.NodeManager
>> 28778 org.apache.spark.executor.CoarseGrainedExecutorBackend
>> 28748 org.apache.spark.deploy.yarn.ExecutorLauncher
>> 28463 org.apache.spark.deploy.SparkSubmit
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Sat, Jun 18, 2016 at 6:16 PM, Mich Talebzadeh
>> <mich.talebzadeh@gmail.com> wrote:
>> > Can you please run jps on 1-node host and send the output. All those
>> > executor IDs some are just duplicates!
>> >
>> > HTH
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> >
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> >
>> >
>> > On 18 June 2016 at 17:08, Jacek Laskowski <jacek@japila.pl> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Thanks Mich and Akhil for such prompt responses! Here's the screenshot
>> >> [1] which is a part of
>> >> https://issues.apache.org/jira/browse/SPARK-16047 I reported today (to
>> >> have the executors sorted by status and id).
>> >>
>> >> [1]
>> >>
>> https://issues.apache.org/jira/secure/attachment/12811665/spark-webui-executors.png
>> >>
>> >> Pozdrawiam,
>> >> Jacek Laskowski
>> >> ----
>> >> https://medium.com/@jaceklaskowski/
>> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >> Follow me at https://twitter.com/jaceklaskowski
>> >>
>> >>
>> >> On Sat, Jun 18, 2016 at 6:05 PM, Akhil Das <akhld@hacked.work> wrote:
>> >> > A screenshot of the executor tab will explain it better. Usually
>> >> > executors
>> >> > are allocated when the job is started, if you have a multi-node
>> cluster
>> >> > then
>> >> > you'll see executors launched on different nodes.
>> >> >
>> >> > On Sat, Jun 18, 2016 at 9:04 PM, Jacek Laskowski <jacek@japila.pl>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> This is for Spark on YARN - a 1-node cluster with Spark
>> 2.0.0-SNAPSHOT
>> >> >> (today build)
>> >> >>
>> >> >> I can understand that when a stage fails a new executor entry shows
>> up
>> >> >> in web UI under Executors tab (that corresponds to a stage
>> attempt). I
>> >> >> understand that this is to keep the stdout and stderr logs for
>> future
>> >> >> reference.
>> >> >>
>> >> >> Why are there multiple executor entries under the same executor
IDs?
>> >> >> What are the executor entries exactly? When are the new ones created
>> >> >> (after a Spark application is launched and assigned the
>> >> >> --num-executors executors)?
>> >> >>
>> >> >> Pozdrawiam,
>> >> >> Jacek Laskowski
>> >> >> ----
>> >> >> https://medium.com/@jaceklaskowski/
>> >> >> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> >> >> Follow me at https://twitter.com/jaceklaskowski
>> >> >>
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> >> For additional commands, e-mail: user-help@spark.apache.org
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Cheers!
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: user-help@spark.apache.org
>> >>
>> >
>>
>
>

Mime
View raw message