calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: MATCH_RECOGNIZE
Date Mon, 06 Aug 2018 21:45:46 GMT
If a JDBC driver is a problem, it shouldn’t be hard to mock a connection that can create
a statement that can describe itself and execute a query. Quidem makes light use of JDBC.

> On Aug 6, 2018, at 10:33 AM, Fabian Hueske <fhueske@gmail.com> wrote:
> 
> OK, I see.
> Flink doesn't have support for JDBC yet.
> Would need to look into that.
> 
> 2018-08-02 21:35 GMT+02:00 Julian Hyde <jhyde@apache.org>:
> 
>> Quidem can run on top of any JDBC data source (you just need to invoke
>> with a connection factory by implementing a simple SPI). But it requires
>> queries to terminate (i.e. can’t handle streaming queries). So, if Flink
>> SQL is were able to run queries on an EMP table, then I think it could be
>> tested using Quidem.
>> 
>>> On Aug 2, 2018, at 6:27 AM, Fabian Hueske <fhueske@gmail.com> wrote:
>>> 
>>> Hi Julian,
>>> 
>>> It would be great to use the same test suite.
>>> 
>>> We have quite a few tests in Flink but they are not super well organized.
>>> I would love to have more structure for at least some of the tests.
>>> 
>>> I had a quick look at how Calcite runs its Quidem tests.
>>> Not sure if this is a format that we could easily adopt to, but maybe its
>>> possible to put a test data set, queries, and results in a more portable
>>> format.
>>> 
>>> Best, Fabian
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 2018-07-31 19:54 GMT+02:00 Julian Hyde <jhyde@apache.org>:
>>> 
>>>> I’m delighted that Flink is getting full SQL support for
>> MATCH_RECOGNIZE.
>>>> 
>>>> Sounds like it might be challenging to share the implementation, but
>> could
>>>> we perhaps share the test suite? (I.e. a set of SQL queries and their
>>>> expected results.)
>>>> 
>>>> I added a simple test in https://github.com/julianhyde/calcite/commit/
>>>> ee460847643ec17544f310088affd99be4028bb6 <https://github.com/
>>>> julianhyde/calcite/commit/ee460847643ec17544f310088affd99be4028bb6>
>> that
>>>> could be extended.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Jul 31, 2018, at 8:07 AM, Fabian Hueske <fhueske@gmail.com>
wrote:
>>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I'd like to share the plans for MATCH_RECOGNIZE support in Flink.
>>>>> 
>>>>> Flink features a so-called CEP library for quite some time [1]. The CEP
>>>>> features is a popular feature and frequently used.
>>>>> In a nutshell, the library provides a domain-specific API to define
>> event
>>>>> patterns. The patterns are translated into a state machine and
>> evaluated
>>>> in
>>>>> a streaming program.
>>>>> 
>>>>> Even before, we learned about about MATCH_RECOGNIZE, Till (another
>> Flink
>>>>> committer) and I gave a few talks about unifying SQL and CEP [2].
>>>>> Hence, we were quite excited when we learned about MATCH_RECOGNIZE and
>>>> even
>>>>> more when it was added to Calcite.
>>>>> Shortly after that, we got a PR [3] which translated the parsed
>>>>> MATCH_RECOGNIZE clause into patterns of our CEP library.
>>>>> However, we never really got to the point of merging that contribution,
>>>>> mainly because there were some inconsistencies in the semantics of
>>>>> MATCH_RECOGNIZE and Flink's CEP library.
>>>>> 
>>>>> Recently, a Flink committers picked up this feature again, validated
>> the
>>>>> the semantics, and made a few corrections [4].
>>>>> The CEP library is now ready to support a subset of the MATCH_RECOGNIZE
>>>>> features.
>>>>> Unfortunately, MATCH_RECOGNIZE support won't make it into the upcoming
>>>>> 1.6.0 release, but the plans are to add it for the 1.7.0 release.
>>>>> 
>>>>> Regarding the idea of sharing parts of the evaluation logic.
>>>>> Flink has runtime support for a subset of the MATCH_RECOGNIZE clause.
>>>>> Unfortunately, I am not familiar with the internals of Flink's CEP
>>>> library
>>>>> and don't know how portable it is.
>>>>> 
>>>>> Best, Fabian
>>>>> 
>>>>> [1]
>>>>> https://ci.apache.org/projects/flink/flink-docs-
>>>> release-1.5/dev/libs/cep.html <https://ci.apache.org/
>>>> projects/flink/flink-docs-release-1.5/dev/libs/cep.html>
>>>>> [2]
>>>>> https://www.slideshare.net/tillrohrmann/streaming-
>>>> analytics-cep-two-sides-of-the-same-coin <https://www.slideshare.net/
>>>> tillrohrmann/streaming-analytics-cep-two-sides-of-the-same-coin>
>>>>> [3] https://github.com/apache/flink/pull/4502 <
>>>> https://github.com/apache/flink/pull/4502>
>>>>> [4] https://issues.apache.org/jira/browse/FLINK-9593 <
>>>> https://issues.apache.org/jira/browse/FLINK-9593>
>>>>> 
>>>>> 2018-07-23 21:03 GMT+02:00 Sergey Nuyanzin <snuyanzin@gmail.com
>> <mailto:
>>>> snuyanzin@gmail.com>>:
>>>>> 
>>>>>> looks exciting.
>>>>>> If it is possible I would like to take a part of it however I'm not
>> sure
>>>>>> about this week (I could since August)
>>>>>> 
>>>>>> On Mon, Jul 23, 2018 at 9:10 PM, Michael Mior <mmior@apache.org
>>>> <mailto:mmior@apache.org>> wrote:
>>>>>> 
>>>>>>> This does sound like my idea of fun, but unfortunately I won't
have
>>>>>>> the time to contribute in the near future. I'll keep this on
my radar
>>>>>>> though. I also shared this message with all the students in our
>>>>>>> research group and I wouldn't be surprised if there was someone
>>>>>>> willing to jump in. Thanks for keeping this moving Julian!
>>>>>>> 
>>>>>>> --
>>>>>>> Michael Mior
>>>>>>> mmior@apache.org <mailto:mmior@apache.org>
>>>>>>> Le lun. 23 juil. 2018 à 13:54, Julian Hyde <jhyde@apache.org
>> <mailto:
>>>> jhyde@apache.org>> a écrit :
>>>>>>>> 
>>>>>>>> For quite a while we have had partial support for MATCH_RECOGNIZE.
>> We
>>>>>>> support it in the parser and validator, but there is no runtime
>>>>>>> implementation. It’s a shame, because MATCH_RECOGNIZE is an
>> incredibly
>>>>>>> powerful SQL feature for both traditional SQL (it’s in Oracle
12c)
>> and
>>>>>> for
>>>>>>> continuous query (aka complex event processing - CEP).
>>>>>>>> 
>>>>>>>> I figure it’s time to change that. My plan is to implement
it
>>>>>>> incrementally, getting simple queries working to start with,
then
>> allow
>>>>>>> people to add more complex queries.
>>>>>>>> 
>>>>>>>> In a dev branch [1], I’ve added a method Enumerables.match[2].
The
>>>> idea
>>>>>>> is that if you supply an Enumerable of input data, a finite state
>>>> machine
>>>>>>> to figure out when a sequence of rows makes a match (represented
by a
>>>>>>> transition function: (state, row) -> state), and a function
to
>> convert
>>>> a
>>>>>>> matched set of rows to a set of output rows. The match method
is
>> fairly
>>>>>>> straightforward, and I almost have it finished.
>>>>>>>> 
>>>>>>>> The complexity is in generating the finite state machine,
emitter
>>>>>>> function, and so forth.
>>>>>>>> 
>>>>>>>> Can someone help me with this task? If your idea of fun is
>>>> implementing
>>>>>>> database algorithms, this is about as much fun as it gets. You
>> learned
>>>>>>> about finite state machines in college - this is your chance
to
>>>> actually
>>>>>>> write one!
>>>>>>>> 
>>>>>>>> This might be a good joint project with the Flink community.
I know
>>>>>>> Flink are thinking of implementing CEP, and the algorithm we
write
>> here
>>>>>>> could be shared with Flink (for use via Flink SQL or via the
Flink
>>>> API).
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>>> [1] https://github.com/julianhyde/calcite/commits/1935-match-
>>>> recognize
>>>>>> <
>>>>>>> https://github.com/julianhyde/calcite/commits/1935-match-recognize
<
>>>> https://github.com/julianhyde/calcite/commits/1935-match-recognize>>
>>>>>>>> 
>>>>>>>> [2] https://github.com/julianhyde/calcite/commit/ <
>>>> https://github.com/julianhyde/calcite/commit/>
>>>>>>> 4dfaf1bbee718aa6694a8ce67d829c32d04c7e87#diff-
>>>>>>> 8a97a64204db631471c563df7551f408R73 <https://github.com/ <
>>>> https://github.com/>
>>>>>>> julianhyde/calcite/commit/4dfaf1bbee718aa6694a8ce67d829c
>>>> 32d04c7e87#diff-
>>>>>>> 8a97a64204db631471c563df7551f408R73>
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> Sergey
>>>> 
>>>> 
>> 
>> 


Mime
View raw message