flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sahil Arora <sahilarora....@gmail.com>
Subject Re: Optimizing multiple aggregate queries on a CEP using Flink
Date Thu, 15 Feb 2018 19:59:38 GMT
Thank you Kostas for your inputs. We will try to integrate an optimizer
into flink and will get back in case we get stuck.

Regards.

On Thu, 15 Feb 2018 at 19:11 Kostas Kloudas <k.kloudas@data-artisans.com>
wrote:

> Hi Sahil,
>
> Currently CEP does not support multi-query optimizations out-of-the-box.
> In some cases you can do manual optimizations to your code, but there is
> no optimizer involved.
>
> Cheers,
> Kostas
>
>
> On Feb 15, 2018, at 11:12 AM, Sahil Arora <sahilarora.535@gmail.com>
> wrote:
>
> Hi Timo,
> Thanks a lot for the help. I will be looking forward to a reply from
> Kostas to be clearer on this.
>
>
> On Mon, 12 Feb 2018, 10:01 pm Timo Walther, <twalthr@apache.org> wrote:
>
>> Hi Sahil,
>>
>> I'm not a CEP expert but I will loop in Kostas (in CC). In general, the
>> example that you described can be easily done with a ProcessFunction [1]. A
>> process function not only allows to keep state (like a count) but also
>> allows you to set timers flexibly for specific use cases such that
>> aggregations can be triggered/reused. So in general I would say that
>> implementing and testing such an algorithm is possible. How easy it can be
>> interegrated into the CEP API, I don't know.
>>
>> Regards,
>> Timo
>>
>>
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html
>>
>> Am 2/9/18 um 11:28 PM schrieb Sahil Arora:
>>
>> Hi there,
>> We have been working on a project with the title "Optimizing Multiple
>> Aggregate Queries over a Complex Event Processing Engine". The aim is to
>> optimize a group of queries. Take such as* "how many cars passed the
>> post in the past 1 minute" *and* "how many cars passed the post in the
>> past 2 minutes"* are 2 queries, and the naive and inefficient method to
>> answer both the queries is to independently solve both of these queries one
>> by one and find the answer. However, the optimum way would be to minimize
>> the computation by using the answer given by query 1 and using it in query
>> 2. This is basically what our aim is, to minimize computation cost when we
>> have multiple aggregate queries in a CEP.
>>
>> We have been searching for some platform which supports CEP, and Flink is
>> probably one of them. Hence, it would be very helpful if we could get some
>> answers to the following questions:
>>
>> 1. Does flink already have some method of optimizing multiple aggregate
>> queries?
>> 2. Is it possible for us to implement / test such an algorithm in flink
>> which considers multiple queries in a CEP, like having a database of SQL
>> queries and testing an algorithm of our choice?
>>
>> Any other inputs which may help us with solving the problem would be
>> highly welcome.
>>
>> Thanks a lot.
>> --
>> Sahil Arora
>> Final year B.Tech Undergrad | Indian Institute of Technology Mandi
>> Web: https://sahilarora535.github.io
>> LinkedIn: sahilarora535 <https://www.linkedin.com/in/sahilarora535/>
>> Ph: +91-8130506047 <+91%2081305%2006047>
>>
>>
>> --
> Sahil Arora
> Final year B.Tech Undergrad | Indian Institute of Technology Mandi
> Web: https://sahilarora535.github.io
> LinkedIn: sahilarora535 <https://www.linkedin.com/in/sahilarora535/>
> Ph: +91-8130506047 <+91%2081305%2006047>
>
>
> --
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535 <https://www.linkedin.com/in/sahilarora535/>
Ph: +91-8130506047 <+91%2081305%2006047>

Mime
View raw message