falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex C (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FALCON-1149) The 'today' EL date expression is resolving to yesterday's date, for process instance input feed ranges
Date Thu, 09 Apr 2015 15:21:12 GMT
Alex C created FALCON-1149:
------------------------------

             Summary: The 'today' EL date expression is resolving to yesterday's date, for
process instance input feed ranges
                 Key: FALCON-1149
                 URL: https://issues.apache.org/jira/browse/FALCON-1149
             Project: Falcon
          Issue Type: Bug
    Affects Versions: 0.6, 0.5
         Environment: HDP 2.1 sandbox, HDP 2.2 sandbox; server in UTC
            Reporter: Alex C


*Steps to reproduce* 
1. Submit a cluster named 'sandbox'
2. Submit a feed f1:
{code:xml}
<feed name="f1" description="f1" xmlns="uri:falcon:feed:0.1">
  <frequency>days(1)</frequency>
  <timezone>UTC</timezone>
  <late-arrival cut-off="hours(48)" />
  <clusters>
    <cluster name="sandbox" type="source">
      <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
      <retention limit="months(9999)" action="delete" />
    </cluster>
  </clusters>
  <locations>
    <location type="data"
      path="/f1/${YEAR}/${MONTH}/${DAY}" />
  </locations>
  <ACL owner="ambari-qa" group="users" permission="0775" />
  <schema location="/none" provider="none" />
</feed>
{code}
3. Submit a process p1:
{code:xml}
<process name="p1" xmlns="uri:falcon:process:0.1">
  <clusters>
    <cluster name="sandbox">
      <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
    </cluster>
  </clusters>
  <parallel>1</parallel>
  <order>FIFO</order>
  <frequency>days(1)</frequency>
  <outputs>
    <output name="output" feed="f1" instance="today(0,0)" />
  </outputs>
  <properties>
  </properties>
  <workflow name="p1-wf" engine="oozie" path="/apps/p1" />
  <retry policy="periodic" delay="minutes(60)" attempts="24" />
</process>
{code}
4. Submit a feed f2:
{code:xml}
<feed name="f2" description="f2" xmlns="uri:falcon:feed:0.1">
  <frequency>days(1)</frequency>
  <timezone>UTC</timezone>
  <late-arrival cut-off="hours(48)" />
  <clusters>
    <cluster name="sandbox" type="source">
      <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
      <retention limit="months(9999)" action="delete" />
    </cluster>
  </clusters>
  <locations>
    <location type="data"
      path="/f2/${YEAR}/${MONTH}/${DAY}" />
  </locations>
  <ACL owner="ambari-qa" group="users" permission="0775" />
  <schema location="/none" provider="none" />
</feed>
{code}
5. Submit a process p2:
{code:xml}
<process name="p2" xmlns="uri:falcon:process:0.1">
  <clusters>
    <cluster name="sandbox">
      <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
    </cluster>
  </clusters>
  <parallel>1</parallel>
  <order>FIFO</order>
  <frequency>days(1)</frequency>
  <inputs>
    <input name="input" feed="f1" start="today(0,0)" end="today(0,0)" />
  </inputs>
  <outputs>
    <output name="output" feed="f2" instance="today(0,0)" />
  </outputs>
  <workflow name="p2-wf" engine="oozie" path="/apps/p2" />
  <retry policy="periodic" delay="minutes(60)" attempts="24" />
</process>
{code}
6. Note that:
- Process p1 has no input feed (the data is fetched from some other location by p1).
- Feed f1 is referenced in the output of p1, and also referenced in the input of p2.
- All feeds are daily, and process input feed ranges and output feeds are daily, by way of
the 'today(0,0)' EL expression.

7. Finally, schedule all feeds and processes after 08:30Z on a given day, 'today'..

*Expected:*
1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition in
f1 for 'today'
2. The first scheduled instance for p2 proceeds to COMPLETED, and produces a partition in
f2 for 'today', since it looks for and finds a corresponding partition for 'today' in f1.

*Actual:*
1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition in
f1 for 'today'
2. However, the first scheduled instance for p2 is left in WAITING state, since it is looking
for a partition in f1 for 'yesterday', which does not exist (and will never exist).

I am currently working around this unexpected behaviour by specifying the input feed range
start and end for p2 as 'today(24,0)' instead of 'today(0,0)'

Please advise if this is indeed a) a bug or b) a mistake in the configuration.

Many thanks,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message