Looks like the spark history server should take the lost exectuors into account by analyzing the output from 'yarn logs applicationId' command.


First of all, after sending email to the mailing list,  I use yarn logs applicationId <application-id> to retrieve the aggregated log successfully.  I found the exceptions I am looking for. 

Now as to your suggestion, when I go to the YARN RM UI, I can only see the "Tracking URL" in the application overview section. When I click it, it brings me to the spark history server UI, where I cannot find the lost exectuors. The only logs link I can find one the YARN RM site is the ApplicationMaster log, which is not what I need. Did I miss something?


Can you go to YARN RM UI to find all the attempts for this Spark Job ?

The two lost executors should be found there.

When running a Spark job on YARN, 2 executors somehow got lost during the execution. The message on the history server GUI is “CANNOT find address”.  Two extra executors were launched by YARN and eventually finished the job. Usually I go to the “Executors” tab on the UI to check the executor stdout/stderr for troubleshoot. Now if I go to the “Executors” tab,  I do not see the 2 executors that were lost. I can only see the rest executors and the 2 new executors. Thus I cannot check the stdout/stderr of the lost executors. How can I access the log files of these lost executors to find out why they were lost?