tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-3028) Improvements to error handling
Date Thu, 07 Jan 2016 22:58:39 GMT
Siddharth Seth created TEZ-3028:
-----------------------------------

             Summary: Improvements to error handling
                 Key: TEZ-3028
                 URL: https://issues.apache.org/jira/browse/TEZ-3028
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Siddharth Seth


There's several places where exceptions can reach the Dispatcher - which can cause a restart.
Some of these may be valid and need to be evaluated.
e.g. TaskCommunicatorManager tracks known containers etc. In case of an error - it throws
an unchecked exception, which I believe will reach the dispatcher directly. (Something like
this happening would indicate a bug in the framework). Should this trigger a restart of the
AM - or shutting down with an internal error?

The TaskSchedulerManager handles exceptions while processing events and dispatches a generic
INTERNAL_ERRROR to the DAGAppMaster. This can be augmented with the reason for the error so
that diagnostics are displayed correctly (or at least posted to the history service)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message