helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-601) Allow work flow to schedule dependency jobs in parallel
Date Sun, 26 Jul 2015 00:56:04 GMT

    [ https://issues.apache.org/jira/browse/HELIX-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641815#comment-14641815

ASF GitHub Bot commented on HELIX-601:

Github user brandtg commented on the pull request:

    What's the specific use case for this feature? I'm imagining something like, say, re-indexing
multiple partitions of data. You want to just give it a shot and let the chips fall as they
may, then re-run the task on problem partitions.
    Maybe I'm not understanding it correctly, and there might be some gotcha for this suggestion...
But if you have a task that can be partitioned in to N sequences of steps and require no synchronization
between steps, why don't you just model it as N tasks all with one partition (i.e. each has
its own target resource)?
    Obviously this is more load on ZK, but I'm wondering if it'd be okay for your use case.
You could also leverage batch messaging to mitigate this I think.

> Allow work flow to schedule dependency jobs in parallel
> -------------------------------------------------------
>                 Key: HELIX-601
>                 URL: https://issues.apache.org/jira/browse/HELIX-601
>             Project: Apache Helix
>          Issue Type: New Feature
>            Reporter: Congrui Ji
> Currently, Helix won't schedule dependency jobs in a same work flow. For example, if
Job2 depends on Job1, Job2 won't be scheduled until every partition of Job1 is completed.
> However, if some participant is very slow, then all dependency jobs is waiting for that
single participant.
> Helix should be able to schedule multiple jobs according to a parameter.
> A.C.
> 1. Introduce parallel count parameter in work flow and job queue.
> 2. Dependency jobs can be scheduled according to the parameter (Now the parameter is
always 1, so no parallel)
> 3. If Job2 depends on Job1, Job1 is scheduled before Job2.
> 4. No parallel jobs on the same instance. If a instance is running Job1, it won't run
Job2 until Job1 is finished. 

This message was sent by Atlassian JIRA

View raw message