calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chinmay Kolhatkar <chin...@apache.org>
Subject Re: Using VolcanoPlanner to create unified RelNode tree from multiple SQL
Date Fri, 14 Oct 2016 04:39:18 GMT
Apologies for delay on this.
I've logged a JIRA for this:
https://issues.apache.org/jira/browse/CALCITE-1440


On Mon, Oct 10, 2016 at 11:13 PM, Julian Hyde <jhyde@apache.org> wrote:

> It sounds as if your requirements go beyond CALCITE-481. If so, please log
> a JIRA case.
>
> > On Oct 10, 2016, at 8:52 AM, Chinmay Kolhatkar <chinmay@apache.org>
> wrote:
> >
> > Thanks Julian. I think CALCITE-481 would be a good improvement.
> >
> > On Wed, Oct 5, 2016 at 11:02 PM, Julian Hyde <jhyde.apache@gmail.com
> <mailto:jhyde.apache@gmail.com>> wrote:
> >
> >> What you are asking for is not currently possible but would be a very
> >> interesting, useful and powerful extension to Calcite. And in my
> opinion it
> >> fits well with the way with the way that Volcano/Cascades optimizers
> work.
> >> The idea would be for the optimizer to work on a forest (a set of trees
> >> with different roots, but shared leaves).
> >>
> >> It is related to https://issues.apache.org/jira/browse/CALCITE-481,
> which
> >> deals with trees that have a single root but store intermediate results
> so
> >> that they can be used more than once. I surmise that you are also very
> >> interested in maximizing the commonality between trees.
> >>
> >> The trickiest part is to get the cost model right. You need to account
> for
> >> a cost each time the work is done, not each time the result is used.
> Jesus
> >> researched this
> >> very topic, using integer linear programming (ILP) to incorporate these
> >> considerations into the cost model.
> >>
> >> Can you please create a JIRA case for it? Maybe Jesus can add some
> >> comments there.
> >>
> >> It is also related to Hive multi-table queries [1]. I have not seen a
> SQL
> >> syntax for multi-output queries. Maybe we can devise one. Or does
> someone
> >> know of one?
> >>
> >> And it presents challenges/opportunities for engines. Some engines can
> >> push to multiple consumers, or equivalently create a stream of tuples
> that
> >> can be read by multiple consumers, others can only persist an
> intermediate
> >> result that can be read by multiple consumers. Spark had their own
> version
> >> of this discussion when implementing multitable in Hive-Spark[2].
> >>
> >> Julian
> >>
> >> [1] https://cwiki.apache.org/confluence/display/Hive/
> >> GettingStarted#GettingStarted-MULTITABLEINSERT <
> https://cwiki.apache.org/ <https://cwiki.apache.org/>
> >> confluence/display/Hive/GettingStarted#GettingStarted-MULTITABLEINSERT>
> >>
> >> [2] https://issues.apache.org/jira/browse/SPARK-3622 <
> https://issues.apache.org/jira/browse/SPARK-3622> <
> >> https://issues.apache.org/jira/browse/SPARK-3622>
> >>
> >>> On Oct 4, 2016, at 9:23 AM, Chinmay Kolhatkar <chinmay@datatorrent.com
> >
> >> wrote:
> >>>
> >>> Dear Community,
> >>>
> >>> I'm working on integration with Apache Apex and Apache Calcite
> >> (underlying
> >>> Apache Jira is: APEXMALHAR-1818).
> >>>
> >>> I have a question related to conversion from SQL to RelNode Tree.
> >>>
> >>> Is it possible that VolcanoPlanner can take multiple SQL statements as
> >>> input and return a unified RelNode Tree?
> >>> Example of above is as follows:
> >>>
> >>> SELECT COL1, COL2 FROM TABLE WHERE COL3 > 10;
> >>> SELECT COL1, COL2 FROM TABLE WHERE COL4  = 'abc';
> >>>
> >>> Above 2 statements has a common path and hence can provide an unified
> >>> RelNode tree as follows:
> >>>
> >>> [Scan] -> [Project (COL1, COL2)] -> [Filter (COL4 = 'abc')] ->
[Delta]
> >>>                   |
> >>>                   V
> >>>           [Filter (COL3 > 10)]
> >>>                   |
> >>>                   v
> >>>                [Delta]
> >>>
> >>>
> >>> Thanks,
> >>> Chinmay.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message