tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-3028) Improvements to error handling
Date Thu, 07 Jan 2016 22:58:39 GMT
Siddharth Seth created TEZ-3028:

             Summary: Improvements to error handling
                 Key: TEZ-3028
                 URL: https://issues.apache.org/jira/browse/TEZ-3028
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Siddharth Seth

There's several places where exceptions can reach the Dispatcher - which can cause a restart.
Some of these may be valid and need to be evaluated.
e.g. TaskCommunicatorManager tracks known containers etc. In case of an error - it throws
an unchecked exception, which I believe will reach the dispatcher directly. (Something like
this happening would indicate a bug in the framework). Should this trigger a restart of the
AM - or shutting down with an internal error?

The TaskSchedulerManager handles exceptions while processing events and dispatches a generic
INTERNAL_ERRROR to the DAGAppMaster. This can be augmented with the reason for the error so
that diagnostics are displayed correctly (or at least posted to the history service)

This message was sent by Atlassian JIRA

View raw message