calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chinmay Kolhatkar <chin...@apache.org>
Subject Re: Using VolcanoPlanner to create unified RelNode tree from multiple SQL
Date Mon, 10 Oct 2016 15:52:23 GMT
Thanks Julian. I think CALCITE-481 would be a good improvement.

On Wed, Oct 5, 2016 at 11:02 PM, Julian Hyde <jhyde.apache@gmail.com> wrote:

> What you are asking for is not currently possible but would be a very
> interesting, useful and powerful extension to Calcite. And in my opinion it
> fits well with the way with the way that Volcano/Cascades optimizers work.
> The idea would be for the optimizer to work on a forest (a set of trees
> with different roots, but shared leaves).
>
> It is related to https://issues.apache.org/jira/browse/CALCITE-481, which
> deals with trees that have a single root but store intermediate results so
> that they can be used more than once. I surmise that you are also very
> interested in maximizing the commonality between trees.
>
> The trickiest part is to get the cost model right. You need to account for
> a cost each time the work is done, not each time the result is used. Jesus
> researched this
> very topic, using integer linear programming (ILP) to incorporate these
> considerations into the cost model.
>
> Can you please create a JIRA case for it? Maybe Jesus can add some
> comments there.
>
> It is also related to Hive multi-table queries [1]. I have not seen a SQL
> syntax for multi-output queries. Maybe we can devise one. Or does someone
> know of one?
>
> And it presents challenges/opportunities for engines. Some engines can
> push to multiple consumers, or equivalently create a stream of tuples that
> can be read by multiple consumers, others can only persist an intermediate
> result that can be read by multiple consumers. Spark had their own version
> of this discussion when implementing multitable in Hive-Spark[2].
>
> Julian
>
> [1] https://cwiki.apache.org/confluence/display/Hive/
> GettingStarted#GettingStarted-MULTITABLEINSERT <https://cwiki.apache.org/
> confluence/display/Hive/GettingStarted#GettingStarted-MULTITABLEINSERT>
>
> [2] https://issues.apache.org/jira/browse/SPARK-3622 <
> https://issues.apache.org/jira/browse/SPARK-3622>
>
> > On Oct 4, 2016, at 9:23 AM, Chinmay Kolhatkar <chinmay@datatorrent.com>
> wrote:
> >
> > Dear Community,
> >
> > I'm working on integration with Apache Apex and Apache Calcite
> (underlying
> > Apache Jira is: APEXMALHAR-1818).
> >
> > I have a question related to conversion from SQL to RelNode Tree.
> >
> > Is it possible that VolcanoPlanner can take multiple SQL statements as
> > input and return a unified RelNode Tree?
> > Example of above is as follows:
> >
> > SELECT COL1, COL2 FROM TABLE WHERE COL3 > 10;
> > SELECT COL1, COL2 FROM TABLE WHERE COL4  = 'abc';
> >
> > Above 2 statements has a common path and hence can provide an unified
> > RelNode tree as follows:
> >
> > [Scan] -> [Project (COL1, COL2)] -> [Filter (COL4 = 'abc')] -> [Delta]
> >                    |
> >                    V
> >            [Filter (COL3 > 10)]
> >                    |
> >                    v
> >                 [Delta]
> >
> >
> > Thanks,
> > Chinmay.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message