flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dawidwys <...@git.apache.org>
Subject [GitHub] flink issue #4331: [FLINK-7169][CEP] Support AFTER MATCH SKIP function in CE...
Date Thu, 27 Jul 2017 12:44:14 GMT
Github user dawidwys commented on the issue:

    https://github.com/apache/flink/pull/4331
  
    Hi @yestinchen ,
    Thanks for the update.
    
    After second round of review. I found many problems with current approach. It returns
only the first match in a stream in most cases.
    
    1. Let's analyze a pattern `A B C` with `SKIP_TO_FIRST C` and a sequence `a1 b1 c1 a2
b2 c2`. It will return only `a1 b1 c1` and will left the NFA without any valid `ComputationalStates`
which results in stopping processing.
    
    2. Another problem is we do not handle a matches that can potentially finish before previously
started. E.g. for Pattern 
    
    ```
    Pattern<Event, ?> pattern = Pattern.<Event>begin("ab").where(new SimpleCondition<Event>()
{
    	@Override
    	public boolean filter(Event value) throws Exception {
    		return value.getName().equals("a") || value.getName().equals("b");
    	}
    }).followedBy("c").where(new IterativeCondition<Event>() {
    	@Override
    	public boolean filter(Event value, Context<Event> ctx) throws Exception {
    		return value.getName().equals("c") && ctx.getEventsForPattern("ab").iterator().next().getPrice()
== value.getPrice();
    	}
    }).setAfterMatchSkipStrategy(new AfterMatchSkipStrategy(AfterMatchSkipStrategy.SkipStrategy.SKIP_PAST_LAST_EVENT));
    ```
    
    and a sequence `a(price = 1) b(price = 2) c(price = 2)`. I think a desired behaviour would
be to start new matching after `c` event, but it won't as the matching started at `a` and
will not start at `b`.
    
    ---
    
    Some general notes:
    
    1. I think the SQL's specification does not suits well into CEP's library as we do not
operate on a partition/bounded collection of events. The specification on the other hand assumes
such bounded data. I think we would benefit from some additional documentation how the AFTER_MATCH
clause works in case of unbounded data. E.g. what does **_empty match_** mean:
    
    > Note that the AFTER MATCH SKIP syntax only determines the point to resume scanning
for a match after a non-empty match. When an empty match is found, one row is skipped (as
if SKIP TO NEXT ROW had been speci ed). Thus an empty match never causes one of these exceptions.
    
    etc.
    
    2. I really don't like the idea of so many cases when `RuntimeException` can be thrown.
I feel the reason for using CEP is a constantly running jobs that search for patterns in a
stream rather than ad-hoc queries. 
    E.g in case of a Pattern like `A B? C` with `SKIP_TO_LAST B` a sequence like `a c` results
in an exception and the job being killed. In my opinion it does not suits well into constantly
running job. From operational side running such Patterns would be at least interesting ;),
as they depend so much on the arriving data.
    
    3. I don't know the reasoning, but Esper, that was mentioned as the other(besides Oracle)
library that supports `MATCH_RECOGNIZE` clause does not support `AFTER MATCH` at all.
    
    4. I found out there was already an ongoing work to introduce part of the `AFTER MATCH`
(the `SKIP_PAST_LAST`). The corresponding jira: https://issues.apache.org/jira/browse/FLINK-3703
and closed PR: #2367 .
    
    To sum up thanks @yestinchen for the work. Unfortunately I think the clause needs a little
bit more conceptual discussion before we can introduce this change. I think the `SKIP_PAST_LAST`
behaviour would be very helpful (in fact there were alread requests for it in the mailing
list) and the most straight forward to implement. I would love to here your opinions @yestinchen
as well as @kl0u and @dianfu.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message