spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debajyoti Roy <newroy...@gmail.com>
Subject Re: Why Apache Spark doesn't use Calcite?
Date Wed, 15 Jan 2020 17:23:05 GMT
Thanks all, and Matei.

TL;DR of the conclusion for my particular case:
Qualitatively, while Catalyst[1] tries to mitigate learning curve and
maintenance burden, it lacks the dynamic programming approach used by
Calcite[2] and risks falling into local minima.
Quantitatively, there is no reproducible benchmark, that fairly compares
Optimizer frameworks, apples to apples (excluding execution).

References:
[1] -
https://amplab.cs.berkeley.edu/wp-content/uploads/2015/03/SparkSQLSigmod2015.pdf
[2] - https://arxiv.org/pdf/1802.10233.pdf

On Mon, Jan 13, 2020 at 5:37 PM Matei Zaharia <matei.zaharia@gmail.com>
wrote:

> I’m pretty sure that Catalyst was built before Calcite, or at least in
> parallel. Calcite 1.0 was only released in 2015. From a technical
> standpoint, building Catalyst in Scala also made it more concise and easier
> to extend than an optimizer written in Java (you can find various
> presentations about how Catalyst works).
>
> Matei
>
> > On Jan 13, 2020, at 8:41 AM, Michael Mior <mmior@apache.org> wrote:
> >
> > It's fairly common for adapters (Calcite's abstraction of a data
> > source) to push down predicates. However, the API certainly looks a
> > lot different than Catalyst's.
> > --
> > Michael Mior
> > mmior@apache.org
> >
> > Le lun. 13 janv. 2020 à 09:45, Jason Nerothin
> > <jasonnerothin@gmail.com> a écrit :
> >>
> >> The implementation they chose supports push down predicates, Datasets
> and other features that are not available in Calcite:
> >>
> >> https://databricks.com/glossary/catalyst-optimizer
> >>
> >> On Mon, Jan 13, 2020 at 8:24 AM newroyker <newroyker@gmail.com> wrote:
> >>>
> >>> Was there a qualitative or quantitative benchmark done before a design
> >>> decision was made not to use Calcite?
> >>>
> >>> Are there limitations (for heuristic based, cost based, * aware
> optimizer)
> >>> in Calcite, and frameworks built on top of Calcite? In the context of
> big
> >>> data / TCPH benchmarks.
> >>>
> >>> I was unable to dig up anything concrete from user group / Jira.
> Appreciate
> >>> if any Catalyst veteran here can give me pointers. Trying to defend
> >>> Spark/Catalyst.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >>>
> >>
> >>
> >> --
> >> Thanks,
> >> Jason
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
>
>

Mime
View raw message