calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jordan.halterman@gmail.com" <jordan.halter...@gmail.com>
Subject Re: Calcite vs Catalyst
Date Thu, 16 Feb 2017 23:42:16 GMT
Calcite differs from Catalyst in many ways. First of all, Catalyst is essentially a heuristic
optimizer, while Calcite optimizers often combine heuristics and cost-based optimization.
Catalyst pushes down predicates and projections to most data sources, while Calcite can often
push down full queries. It's certainly also capable of pushing down filters for struct fields.
Some of these types of features like SPARK-19609 may have to be implemented as custom rules.
But we've successfully replaced Spark's Catalyst optimizer with Calcite and have recorded
up to two orders of magnitude improvements in performance running TPC-DS queries against many
databases.

Whether there's value in using Calcite in Spark depends on your use case. Drill and other
systems are certainly sufficient to take better advantage of the features of underlying databases.
It's not easy to build the conversions between Catalyst plans and Calcite plans - it took
us months - but doing so allowed us to continue using Spark's popular programmatic APIs while
significantly improving its performance when querying relational databases, Mongo, etc.

> On Feb 16, 2017, at 3:28 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
> 
> Heya,
> 
> I've been using Spark recently and have stumbled across a couple surprising
> bugs/feature gaps. It got me curious about how Calcite would handle the
> same scenarios. Basically, I'm wondering if Calcite would handle these
> scenarios directly or if it would defer to the underlying runtime. I.E.,
> would I be better off for this task with Calcite via Hive or Drill vs.
> Catalyst via Spark.
> 
> Here are the tickets for reference.
> 
> SPARK-19615 Provide Dataset union convenience for divergent schema
> SPARK-19609 Broadcast joins should pushdown join constraints as Filter to
> the larger relation
> SPARK-19638 Filter pushdown not working for struct fields
> 
> Thanks in advance!
> Nick

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message