calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Solimando <alessandro.solima...@gmail.com>
Subject Re: Apache Calcite Spark Adaptor GSOC 2018 Linan Zheng
Date Sat, 17 Mar 2018 07:40:55 GMT
In my experience, if the "native" optimizer cannot be turned off, it can
"revert back" some optimizations when you submit your "optimized"
program/SQL query to the engine.

As Spark 2.X is concerned, I am not aware of any way to turn catalyst off,
so if you have a different cost model and/or query planner you might easily
end up with a different logical and/or physical plan than what you expect.

In the "Calcite performance benchmark" discussion, started by Edmon
Begoli, this
fact is addressed, as he proposed to evaluate Calcite with/without the
"native" optimizer, which makes a lot of sense to me and can lead to
surprising results.

My knowledge of catalyst internals is unfortunately pretty shallow, so I
cannot tell to which extent this can be an issue, or if potential problems
can be by-passed by using HINTS or similar techniques.

If anyone knows more or have practical examples on the subject I would be
very interested in hearing more.

Best regards,
Alessandro

On 16 March 2018 at 22:35, Julian Hyde <jhyde.apache@gmail.com> wrote:

> The purpose of Calcite’s Spark Adapter is to circumvent Spark SQL and
> Catalyst entirely. Calcite parses the SQL, it optimizes it to create a
> physical plan that uses Spark relational operators, then converts that plan
> to a Spark program.
>
> If you want to use Spark SQL and Catalyst that’s totally fine, but don’t
> use Calcite for those cases.
>
> Julian
>
>
> > On Mar 16, 2018, at 11:44 AM, Linan Zheng <lazheng@bu.edu> wrote:
> >
> > Hi Everyone,
> >
> > My name is Linan Zheng and currently a senior CS student at Boston
> > University. I am fascinated by the idea of adding Apache Spark's
> > DataFrame/DataSet API support in Apache Calcite. Right now I am working
> on
> > the proposal which i hope that I can get some advice with. My question is
> > that since Spark has implement the Catalyst query optimizer in its Spark
> > SQL, how should I approach Catalyst's planning rules(logical and
> physical)?
> > And who should be in charge of the query optimization? Any advice and
> > corrections will be much appreciated and thank you guys for reading this
> > email.
> >
> > --
> > Best Regard,
> > Linan Zheng
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message