tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TEZ-3239) ShuffleVertexManager recovery issue when auto parallelism is enabled
Date Wed, 07 Sep 2016 00:39:21 GMT

     [ https://issues.apache.org/jira/browse/TEZ-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ming Ma resolved TEZ-3239.
--------------------------
    Resolution: Invalid

Verified that the issue no longer exists in the master branch.

> ShuffleVertexManager recovery issue when auto parallelism is enabled
> --------------------------------------------------------------------
>
>                 Key: TEZ-3239
>                 URL: https://issues.apache.org/jira/browse/TEZ-3239
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Ming Ma
>         Attachments: tez.am.recovery.attempt.auto.parallelism.log
>
>
> Repro:
> * Enable {{tez.shuffle-vertex-manager.enable.auto-parallel}}.
> * kill the Tez AM container after the job has reached to the point that VM has reconfigured
the Edge.
> * The new Tez AM attempt will fail to the following error.
> {noformat}
> org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist
> at org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexStarted(ShuffleVertexManager.java:497)
> at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:589)
> at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:658)
> at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:653)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> {noformat}
> That is because the edge routing type changed to {{DataMovementType.CUSTOM}} after reconfiguration.
Allowing {{DataMovementType.CUSTOM}} in the following check seems to fix the issue.
> {noformat}
>       if (entry.getValue().getDataMovementType() == DataMovementType.SCATTER_GATHER)
{
>         bipartiteSources++;
>       }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message