On 9 Feb 2016, at 05:55, Prabhu Joseph <prabhujose.gates@gmail.com> wrote:

+ Spark-Dev

On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <prabhujose.gates@gmail.com> wrote:
Hi All,

    A long running Spark job on YARN throws below exception after running for few days.

yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row. org.apache.hadoop.yarn.exceptions.YarnException: No AMRMToken found for user prabhu at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)

Do any of the below renew the AMRMToken and solve the issue

1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days

2. Configuring Proxy user:

<property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value> </property>
<property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value> </property>

wouldnt do that: security issues

3. Can Spark-1.4.0 handle with fix https://issues.apache.org/jira/browse/SPARK-5342


I'll say "maybe" there

How to renew the AMRMToken for a long running job on YARN?

AMRM token renewal should be automatic in AM; Yarn sends a message to the AM (actually an allocate() response with no containers but a new token at the tail of the message.

i don't see any logging in the Hadoopp code there (AMRMClientImpl); filed YARN-4682 to add a log statement

if someone other than me were to supply a patch to that JIRA to add a log statement *by the end of the day* I'll review it and get it in to Hadoop 2.8