tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@apache.org>
Subject Re: occational submitDag error
Date Fri, 18 Sep 2015 16:12:18 GMT
The only layer I can think of in that path might be the recovery layer which is a blocking
call that writes the dag submission information to the recovery log. If that fails, the dag
submission should also then fail. Though it could potentially be optimized to let the dag
submission to go through, run the dag but with recovery disabled ( double-edged sword if the
dag is a large one ).

However, that does not really explain the “YarnUncaughtExceptionHandler: Thread Thread[IPC
Server handler 0 on 50500,5,main] threw an Error.  Shutting down now…” 

TEZ-2663 is something that could be applicable to branch 0.6 but not sure if it will help
track this particular issue down.

thanks
— Hitesh

On Sep 18, 2015, at 8:41 AM, Jonathan Eagles <jeagles@gmail.com> wrote:

> I have a user running 0.6.1 tez (git hash
> 6e588d15184dc691df1c0227f40db91d9bc6d7d6) and a few times a month the job
> submitted returns this error.
> 
> 2015-09-16 01:04:05,069 INFO [IPC Server handler 0 on 50500]
> app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0
> 2015-09-16 01:04:05,617 INFO [IPC Server handler 0 on 50500]
> history.HistoryEventHandler:
> [HISTORY][DAG:dag_1440165794704_806241_1][Event:DAG_SUBMITTED]:
> dagID=dag_1440165794704_806241_1, submitTime=1442365444946
> 2015-09-16 01:04:05,676 WARN [IPC Server handler 0 on 50500]
> ipc.Server: IPC Server handler 0 on 50500, call
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG
> from hostname:55741 Call#585 Retry#0
> 2015-09-16 01:04:05,679 FATAL [IPC Server handler 0 on 50500]
> yarn.YarnUncaughtExceptionHandler: Thread Thread[IPC Server handler 0
> on 50500,5,main] threw an Error.  Shutting down now...
> 2015-09-16 01:04:05,764 INFO [IPC Server handler 0 on 50500]
> util.ExitUtil: Exiting with status -1
> 
> 
> Normally it looks like this
> 
> 2015-09-17 10:03:40,234 INFO [IPC Server handler 0 on 50503]
> app.DAGAppMaster: Running DAG: PigLatin:user_1.pig-0_scope-0
> 2015-09-17 10:03:40,659 INFO [IPC Server handler 0 on 50503]
> history.HistoryEventHandler:
> [HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_SUBMITTED]:
> dagID=dag_1440165794704_873289_1, submitTime=1442484220062
> 2015-09-17 10:03:40,694 INFO [IPC Server handler 0 on 50503]
> impl.VertexImpl: setting additional outputs for vertex scope-2573
> 2015-09-17 10:03:40,696 INFO [IPC Server handler 0 on 50503]
> impl.DAGImpl: Using DAG Scheduler:
> org.apache.tez.dag.app.dag.impl.DAGSchedulerNaturalOrder
> 2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503]
> history.HistoryEventHandler:
> [HISTORY][DAG:dag_1440165794704_873289_1][Event:DAG_INITIALIZED]:
> dagID=dag_1440165794704_873289_1, initTime=1442484220660
> 2015-09-17 10:03:40,698 INFO [IPC Server handler 0 on 50503]
> impl.DAGImpl: dag_1440165794704_873289_1 transitioned from NEW to
> INITED
> ...
> 
> and then it succeeds.
> 
> Is there any known escaped Exceptions over rpc that have been fixed
> that would solved this issue?
> 
> Jon


Mime
View raw message