falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1852) Optional Input for a process not truly optional
Date Wed, 16 Mar 2016 11:15:33 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197197#comment-15197197

ASF GitHub Bot commented on FALCON-1852:

GitHub user pallavi-rao opened a pull request:


    FALCON-1852 Make optional input to a process truly optional

    The following changes have been made:
    1. Creation of "empty dir" under the staging path of a cluster (during cluster creation).
    2. Modify OozieELExtensions to look for availability flag and use only those instances
that have the entire dataset. Use "empty dir" when no instances resolve.
    3. Updated UT and IT for additional validation.
    Tested manually with and without availabilityFlag supplied.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pallavi-rao/falcon 1852

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #73
commit 533e221ed3d5ff1af3e8726c2a54f937c63ea3c4
Author: Pallavi Rao <pallavi.rao@inmobi.com>
Date:   2016-03-16T11:11:59Z

    FALCON-1852 Make optional input to a process truly optional


> Optional Input for a process not truly optional
> -----------------------------------------------
>                 Key: FALCON-1852
>                 URL: https://issues.apache.org/jira/browse/FALCON-1852
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Pallavi Rao
>            Assignee: Pallavi Rao
> Currently, when a feed input is marked as optional, we do not add it to the coordinator
definition's datasets. This means we do not wait for all instances (for a given data window)
to arrive. Instead, we just resolve the paths for a data window and pass it as a parameter.
> For example:
> {noformat}
> <inputs>
>         <!-- In the workflow, the input paths will be available in a variable 'inpaths'
>         <input name="inpaths" feed="in" start="now(0,-5)" end="now(0,-1)"/>
>         <input name="in2paths" feed="in2" start="now(0,-5)" end="now(0,-1)" optional="true"/>
>     </inputs>
> {noformat}
> For a process instance 2013-01-01T00:00Z, the optional input, in2paths, will be resolved
as below:
> {noformat}
>  <property>
>     <name>in2paths</name>
>     <value>hdfs://localhost:9000/data/in2/2013/11/15/00/04,hdfs://localhost:9000/data/in2/2013/11/15/00/03,hdfs://localhost:9000/data/in2/2013/11/15/00/02,hdfs://localhost:9000/data/in2/2013/11/15/00/01,hdfs://localhost:9000/data/in2/2013/11/15/00/00</value>
>   </property>
> {noformat}
> If one of the instance of in2paths (example, hdfs://localhost:9000/data/in2/2013/11/15/00/04)
is missing, the workflow will fail anyway.
> Hence, input, in2paths is not truly optional. Only that the triggering of instance is
not gated on it.

This message was sent by Atlassian JIRA

View raw message