helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hunter L (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HELIX-791) TASK2.0: Add RuntimeJobDag with job iterator functionality
Date Tue, 05 Mar 2019 01:07:00 GMT

     [ https://issues.apache.org/jira/browse/HELIX-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hunter L updated HELIX-791:
---------------------------
    Description: 
Job list iterator methods and underlying data structure were added to JobDag to support retrieval
of jobs by TaskDispatcher (to be implemented) for improvement in Task Framework.

Changelist:
 1. Add RuntimeJobDag
 2. Add a unit test

See below for the design considerations for this internal Helix component:

 

 
h1. Purpose

For the new generation of Helix Task Framework, we need an intermediary iterator that extracts
a list of jobs that need to be processed from WorkflowConfig (jobDag) and WorkflowContext
(jobStates). This document proposes a design for it and presents a few important points of
discussion for performance and ease of use.
h1. Requirements
 # Given a workflow or a JobQueue, it needs to iterate over all jobs in the workflow and produce
a job in an order that *satisfies the topological order* of the DAG.
 # (Optional - discuss) The JobListIterator must produce a topological ordering of jobs in
a *deterministic* way (easily doable by using ordered data structures such as TreeMap).
 # The JobListIterator must do so in an *efficient* manner and should avoid overhead where
possible.
 # (Optional) Upon Controller switch, it needs to be able to re-build itself and restore the
job status possible based on WorkflowContext/JobContext.
 # JobDAG must be backward-compatible.

h1. Architecture & Design
h2. Background

Optimal scheduling for a given DAG and the number of processors (in this case, "consumers",
or _Helix Participants_ that receive tasks) is a widely studied problem in CS. For general
DAGs, however, for variable number of processors, this problem is *NP-complete*. For 2 processors,
there exists a polynomial-time algorithm but above 2, the complexity is unknown. It has been
shown though, that the DAG is like a tree and that level-by-level scheduling is optimal (Aho,
Hopcroft).

The heuristic we could utilize here is the *list scheduling algorithm*. The fact that we do
not have to consider the cost/duration of jobs greatly simplifies the problem (which also
saves us the trouble of worrying about the optimality of the solution). Below I present a
simplified pseudocode algorithm of the list scheduling algorithm catered to our needs.

 
h2. Algorithm
|{{ready-list = \{START}; }}{{// START = a set of un-parented jobs (jobs that do not depend
on any other jobs)}}
{{inflight-list = \{ };}}
{{while}} {{(\|ready-list\|+\|inflight-list\| > }}{{0}}{{) {}}
{{    }}{{for}} {{each node n in ready-list in deterministic order { }}{{//schedule new
tasks}}
{{        }}{{if}} {{(Helix Participant has capacity) {}}
{{            }}{{remove n from ready-list and add to inflight-list;}}
{{            }}{{return}} {{node to the Helix Participant}}
{{        }}{{}}}
{{        }}{{else}} {{break}}{{;}}
{{    }}{{}}}
{{    }}{{for}} {{each node n in inflight-list {}}{{//determine ready tasks}}
{{        }}{{if}} {{(Helix Participant finishes the job) {}}
{{            }}{{remove n from inflight-list;}}
{{            }}{{add every ready successor of n in DAG to ready-list}}
{{        }}{{}}}
{{    }}{{}}}
{{}}}|

This will be further tweaked at implementation time to implement the APIs of an iterator such
as _getNextJob()_.
h1. Alternative Designs
h2. Flattening the DAG

The problems with this approach (vs. unfolding the DAG one step at a time using a ready list
and an in-flight list) are the following:
 # Once flattened with a topological ordering, the DAG loses its dependency information.
 # It could limit parallelizability of the DAG.
 # Failed jobs could potentially block the execution of other jobs due to artificial dependencies
created in the process of flattening.

 

 

  was:
Job list iterator methods and underlying data structure were added to JobDag to support retrieval
of jobs by TaskDispatcher (to be implemented) for improvement in Task Framework.

Changelist:
1. Add RuntimeJobDag
2. Add a unit test


> TASK2.0: Add RuntimeJobDag with job iterator functionality
> ----------------------------------------------------------
>
>                 Key: HELIX-791
>                 URL: https://issues.apache.org/jira/browse/HELIX-791
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: Hunter L
>            Assignee: Hunter L
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Job list iterator methods and underlying data structure were added to JobDag to support
retrieval of jobs by TaskDispatcher (to be implemented) for improvement in Task Framework.
> Changelist:
>  1. Add RuntimeJobDag
>  2. Add a unit test
> See below for the design considerations for this internal Helix component:
>  
>  
> h1. Purpose
> For the new generation of Helix Task Framework, we need an intermediary iterator that
extracts a list of jobs that need to be processed from WorkflowConfig (jobDag) and WorkflowContext
(jobStates). This document proposes a design for it and presents a few important points of
discussion for performance and ease of use.
> h1. Requirements
>  # Given a workflow or a JobQueue, it needs to iterate over all jobs in the workflow
and produce a job in an order that *satisfies the topological order* of the DAG.
>  # (Optional - discuss) The JobListIterator must produce a topological ordering of jobs
in a *deterministic* way (easily doable by using ordered data structures such as TreeMap).
>  # The JobListIterator must do so in an *efficient* manner and should avoid overhead
where possible.
>  # (Optional) Upon Controller switch, it needs to be able to re-build itself and restore
the job status possible based on WorkflowContext/JobContext.
>  # JobDAG must be backward-compatible.
> h1. Architecture & Design
> h2. Background
> Optimal scheduling for a given DAG and the number of processors (in this case, "consumers",
or _Helix Participants_ that receive tasks) is a widely studied problem in CS. For general
DAGs, however, for variable number of processors, this problem is *NP-complete*. For 2 processors,
there exists a polynomial-time algorithm but above 2, the complexity is unknown. It has been
shown though, that the DAG is like a tree and that level-by-level scheduling is optimal (Aho,
Hopcroft).
> The heuristic we could utilize here is the *list scheduling algorithm*. The fact that
we do not have to consider the cost/duration of jobs greatly simplifies the problem (which
also saves us the trouble of worrying about the optimality of the solution). Below I present
a simplified pseudocode algorithm of the list scheduling algorithm catered to our needs.
>  
> h2. Algorithm
> |{{ready-list = \{START}; }}{{// START = a set of un-parented jobs (jobs that do not
depend on any other jobs)}}
> {{inflight-list = \{ };}}
> {{while}} {{(\|ready-list\|+\|inflight-list\| > }}{{0}}{{) {}}
> {{    }}{{for}} {{each node n in ready-list in deterministic order { }}{{//schedule
new tasks}}
> {{        }}{{if}} {{(Helix Participant has capacity) {}}
> {{            }}{{remove n from ready-list and add to inflight-list;}}
> {{            }}{{return}} {{node to the Helix Participant}}
> {{        }}{{}}}
> {{        }}{{else}} {{break}}{{;}}
> {{    }}{{}}}
> {{    }}{{for}} {{each node n in inflight-list {}}{{//determine ready tasks}}
> {{        }}{{if}} {{(Helix Participant finishes the job) {}}
> {{            }}{{remove n from inflight-list;}}
> {{            }}{{add every ready successor of n in DAG to ready-list}}
> {{        }}{{}}}
> {{    }}{{}}}
> {{}}}|
> This will be further tweaked at implementation time to implement the APIs of an iterator
such as _getNextJob()_.
> h1. Alternative Designs
> h2. Flattening the DAG
> The problems with this approach (vs. unfolding the DAG one step at a time using a ready
list and an in-flight list) are the following:
>  # Once flattened with a topological ordering, the DAG loses its dependency information.
>  # It could limit parallelizability of the DAG.
>  # Failed jobs could potentially block the execution of other jobs due to artificial
dependencies created in the process of flattening.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message