tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-3072) Node blacklisting always reruns completed non-leaf tasks
Date Mon, 25 Jan 2016 19:29:39 GMT
Jason Lowe created TEZ-3072:

             Summary: Node blacklisting always reruns completed non-leaf tasks
                 Key: TEZ-3072
                 URL: https://issues.apache.org/jira/browse/TEZ-3072
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Jason Lowe

Recently a user ran a job with many vertices, and there was a bug in the user's code that
caused a problem in one of the trailing vertices in the task.  On some nodes enough tasks
failed that the AM thought it needed to blacklist those nodes.  That blacklisting then caused
many completed vertices to re-run because it thought it needed to re-execute the non-leaf
tasks that had completed on those nodes.  This wasted a lot of cluster resources and job time
for no benefit.

This message was sent by Atlassian JIRA

View raw message