Looks like the spark history server should take the lost exectuors into account by analyzing the output from 'yarn logs applicationId' command.


On Thu, Oct 1, 2015 at 11:46 AM, Lan Jiang <ljiang2@gmail.com> wrote:


Thanks for your reply.

First of all, after sending email to the mailing list,  I use yarn logs applicationId <application-id> to retrieve the aggregated log successfully.  I found the exceptions I am looking for. 

Now as to your suggestion, when I go to the YARN RM UI, I can only see the "Tracking URL" in the application overview section. When I click it, it brings me to the spark history server UI, where I cannot find the lost exectuors. The only logs link I can find one the YARN RM site is the ApplicationMaster log, which is not what I need. Did I miss something?


On Thu, Oct 1, 2015 at 1:30 PM, Ted Yu <yuzhihong@gmail.com> wrote:
Can you go to YARN RM UI to find all the attempts for this Spark Job ?

The two lost executors should be found there.

On Thu, Oct 1, 2015 at 10:30 AM, Lan Jiang <ljiang2@gmail.com> wrote:
Hi, there

When running a Spark job on YARN, 2 executors somehow got lost during the execution. The message on the history server GUI is “CANNOT find address”.  Two extra executors were launched by YARN and eventually finished the job. Usually I go to the “Executors” tab on the UI to check the executor stdout/stderr for troubleshoot. Now if I go to the “Executors” tab,  I do not see the 2 executors that were lost. I can only see the rest executors and the 2 new executors. Thus I cannot check the stdout/stderr of the lost executors. How can I access the log files of these lost executors to find out why they were lost?