calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Feinauer <>
Subject Re: PLC4X Request Optimization
Date Tue, 20 Aug 2019 13:38:34 GMT
Hi Stamatis,

thanks for your response.
I think my brain just needs a bit more time to get really deep into those advanced planning
topics (its sometimes slow on adopting...).
But I will look through it.
We have a meetup this week and will discuss the matter and how to setup everything to enable
some optimization at first (introducing cost estimates and such) and then I will again have
a deeper look and perhaps prepare a test case or a runnable test.
Then its probably the easiest to reason about.


Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" <>:

    Hi Julian F,
    I admit that I didn't really get your example but talking about 'batch
    request optimization'
    and 'collapsing "overlapping" but not equal requests' I get the impression
    that the problem
    is optimizing sets of queries which may have common sub-expressions;
    the problem is usually referred to as multi-query optimization and is
    indeed relevant with
    the Spool operator mentioned by Julian H.
    If that's the case then the most relevant work that I can think of is [1],
    which solves the problem
    by slightly modifying the search strategy of the Volcano planner.
    [1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
    query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
    On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <> wrote:
    > Hi Julian,
    > thanks for the reply.
    > I have to think about that, I think.
    > But as I understand the Spool Operator this is to factor out multiple
    > calculations of the same issue.
    > In our Situation we aim more on collapsing "overlapping" but not equal
    > requests.
    > Consider 8 bits which form physically a byte.
    > If I read 8 BOOLEANs I have 8 different request which mask one bit, return
    > it (padded) as byte. So 8 requests and 8 bytes data transfer (plus masking
    > on the PLC).
    > If I would optimize it to read the byte in one request and do the masking
    > afterwards I would have one request and only 1 byte transferred (plus no
    > masking on the PLC which keeps pressure low there).
    > This could be modelled by introducing respective "RelNodes" and Planner
    > Rules, I think but I do not fully understand how Spool fits in here?
    > Julian
    > Am 19.08.19, 20:42 schrieb "Julian Hyde" <>:
    >     One tricky aspect is to optimize a *batch* of requests.
    >     The trick is to tie together the batch so that it is costed as one
    > request. We don’t have an operator specifically for that, but you could for
    > instance use UNION ALL. E.g. given Q1 and Q2, you could generate a plan for
    >       select count(*) from Q1 union all select count(*) from Q2
    >     If the plan for the batch is be a DAG (i.e. sharing work between the
    > components of the batch by creating something akin to “temporary tables”)
    > then you are in the territory for which we created the Spool operator (see
    > discussion in <
    >     Julian
    >     > On Aug 19, 2019, at 6:34 AM, Julian Feinauer <
    >> wrote:
    >     >
    >     > Hi Danny,
    >     >
    >     > thanks for the quick reply.
    >     > Cost calculation we can of course provide (but it could be a bit
    > different as we have not only CPU and Memory but also Network or something).
    >     >
    >     > And also something like the RelNodes could be provided. In our case
    > this would be "Requests" which are at first "Logical" and are then
    > transformed to "Physical" Requests. For example the API allows you to
    > request many fields per single request but some PLCs only allow one field
    > per request. So this would be one task of this layer.
    >     >
    >     > Julian
    >     >
    >     > Am 19.08.19, 14:44 schrieb "Danny Chan" <>:
    >     >
    >     >    Cool idea ! Julian Feinauer ~
    >     >
    >     >    I think the volcano model can be used the base of the cost
    > algorithm. As long as you define all the metadata that you care about.
    > Another thing is that you should have a struct like RelNode and a method
    > like #computeSelfCost.
    >     >
    >     >    Best,
    >     >    Danny Chan
    >     >    在 2019年8月19日 +0800 PM5:20,Julian Feinauer <
    >     >> Hi folks,
    >     >>
    >     >> I’m here again with another PLC4X related question (
    >     >> As we have more and more usecases we encounter situations where we
    > send LOTS of replies to PLCs which one could sometimes optimize.
    >     >> This has multiple reasons upstream (like multiple different
    > Services sending, or you want two logically different addresses which could
    > be physically equal).
    >     >>
    >     >> So, we consider to add some kind of optimizer which takes a Batch
    > of requests and tries to arrange them in an “optimal” way with regard to
    > som cost function.
    >     >> The cost functions would of course be given by each Driver but the
    > optimizer could / should be rather general (possibly with pluggable rules).
    >     >>
    >     >> As Calcites Planner already includes all of that I ask myself if it
    > could be possible (and make sense) to use that in PLC4X.
    >     >> Generally speaking, this raises the question if the Volcano
    > approach can be suitable for such problems.
    >     >> The other alternative would be to start with some kind of heuristic
    > based planning or with other optimization algorithms (genetic algs, cross
    > entropy,…).
    >     >>
    >     >> Any thoughs or feedbacks are welcome!
    >     >>
    >     >> Julian
    >     >
    >     >

View raw message