calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>
Subject Re: Streaming Queries Optimization
Date Fri, 21 Oct 2016 19:09:57 GMT
Sorry for not being precise when I defined my problem (the title of the
subject must be misleading). The optimization on joins is for regular
relational joins, not streaming, and I can't find a combination of rules to
achieve it. The number of joins would be small(under 8) if it helps you.

As for my second question, I wish to change the cost model you are using in
Calcite to something like this :
Rate-Based Query Optimization for Streaming Information Sources (
http://www-db.cs.wisc.edu/niagara/papers/rates_crc.pdf),
which is rate-based cost optimization. Every operator has a its own
constant F(operator) that defines the output rate of tuples. For example if
we have a filter and a project there would be two plans:

a) Project->Filter : rin*F(project)=rout(project) =>
rout(project)*F(filter)=rout(final)
b) Filter->Project : rin*F(filter)=rout(filter) =>
rout(filter)*F(project)=rout(final) ,
rin=rate of tuples of the input stream and rout= the rate the output tuples

and we choose the plan with the higher rout(final). So, my question is if I
can use the predefined logical operators of calcite, change their cost and
instead of trying to find the minimum cost (as it is now), try to find the
maximum output rate.

Thank you in advance,
George


2016-10-21 19:32 GMT+03:00 Julian Hyde <jhyde@apache.org>:

> I suspect that streaming join requires a plan of a very different
> shape than a regular relational join. If you're joining a stream to a
> table, and the table is small, then you can use a map join (aka a hash
> join with a small "build" side), so that's well understood from the
> world of database query optimization. But joining two streams is very
> different to joining two tables: for one thing, it is a union:
>
>  stream1 join history-of-stream2
>  union
>  history-of-stream2 join stream2
>
> And second, the size of the maps might be different than the "size" of
> the streams. If they're smaller than memory then virtually any plan
> will be OK.
>
> So where to start is easy: Figure out what physical plan you want the
> planner to create. Then work backwards and figure out a cost model
> whereby that plan is better than the other alternatives, and write
> transformation rules that can validly create that physical plan from
> your logical plan.
>
> Julian
>
>
> On Fri, Oct 21, 2016 at 8:35 AM, Γιώργος Θεοδωράκης
> <giwrgosrtheod@gmail.com> wrote:
> > Hi,
> >
> > I have two questions:
> >
> > 1)When trying to optimize the join order of a query what rules should I
> > use? For example I have this query:
> > "select s.orders.productid  "
> > + "from  s.products, s.customers,s.orders "
> > + "where s.orders.productid = s.products.productid and "
> >                                 + "s.customers.customerid=s.orders.
> > customerid  "
> > with these sizes => orders[15 rows] , products [5 rows], orders[10
> rows]. I
> > am using  the heuristicPlanner with these rules for join :
> >     this.hepPlanner.addRule(JoinToMultiJoinRule.INSTANCE);
> >     this.hepPlanner.addRule(LoptOptimizeJoinRule.INSTANCE);
> >     this.hepPlanner.addRule(MultiJoinOptimizeBushyRule.INSTANCE);
> >     this.hepPlanner.addRule(JoinPushThroughJoinRule.LEFT);
> >
> > but nothing happens, and the final logical plan changes according to the
> > order I define the tables after FROM in the query. Is there something I
> am
> > missing? What would the LoptOptimizeJoinRule do?
> >
> > 2)Is it possible to change the cost model of project,filter,aggregate and
> > join to a rate-based, from which I want to get the maximum rate instead
> of
> > the minimum cost for optimization? Should I create new rules or override
> > the old ones? Any hints on where to start?
> >
> > 2016-10-10 15:33 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrtheod@gmail.com>:
> >
> >> Hello,
> >>
> >> I am trying to optimize the logical/physical plan of a given streaming
> >> query with Calcite and execute it in a separate engine. So far, I am
> using
> >> heuristic planner and some cost-based push-down rules and get a
> >> "relational" optimization on the plan. By relational, I mean that this
> is
> >> basic optimization that I would get if my query was executed in a
> >> relational database and wasn't a stream. As a result I am not optimizing
> >> the query with streaming criteria at all.
> >>
> >> Can someone give any suggestions on further optimization on streaming
> >> queries? Is there anything more to do using Calcite, or the optimization
> >> ends by using the built in rules? Finally, any related work would be
> >> welcome.
> >>
> >> Thanks in advance,
> >> George.
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message