tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] jteagles commented on a change in pull request #37: TEZ-4042: Speculative attempts should avoid running on the same node
Date Fri, 22 Feb 2019 05:50:00 GMT
jteagles commented on a change in pull request #37: TEZ-4042: Speculative attempts should avoid
running on the same node
URL: https://github.com/apache/tez/pull/37#discussion_r259218826
 
 

 ##########
 File path: tez-dag/src/main/java/org/apache/tez/dag/app/rm/DagAwareYarnTaskScheduler.java
 ##########
 @@ -567,8 +568,9 @@ private void informAppAboutAssignments(List<Assignment> assignments)
{
    * @param container the container assigned to the task
    */
   private void informAppAboutAssignment(TaskRequest request, Container container) {
-    if (blacklistedNodes.contains(container.getNodeId())) {
-      Object task = request.getTask();
+    Object task = request.getTask();
+    if (blacklistedNodes.contains(container.getNodeId())
+        || task instanceof TaskAttempt && ((TaskAttempt) task).getUnhealthyNodesHistory().contains(container.getNodeId()))
{
 
 Review comment:
   I think informAppAboutAssignment does avoid scheduling the speculative attempt on the nodes
already running tasks, so that is good. However, it has the consequence of deallocating free
containers on nodes running attempts that have been speculated. If we knew exactly that the
node was slow, we could treat the node as unhealthy. But choosing a tasks for speculation
is just a reasonable guess with many false positives.
   
   Instead would it work if the check was made in tryAssignReuseContainer, tryAssignNewContainer,
tryAssignTaskToIdleContainer? With the check made early, we can prevent deallocating containers.
   
   In the future, I can see passing the node to avoid along with the AMRMClient when requesting
new containers to prevent requesting a node to avoid for a speculative task attempt. It may
be possible to do that now, but I have not checked.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message