calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Calcite with Phoenix and Spark
Date Sat, 22 Oct 2016 23:41:30 GMT
Hi Eli,
With the calcite branch of Phoenix you're part way there. I think a good
way to approach this would be to create a new set of operators that
correspond to Spark operations and the corresponding rules that know when
to use them. These could then be costed with the other Phoenix operators at
planning time. Spark would work especially well to store intermediate
results in more complex queries.

Since Spark doesn't integrate natively with Calcite, I think using Spark
directly may not get you where you need to go. In the same way, the
Phoenix-Spark integration is higher level, built on top of Phoenix and has
no direct integration with Calcite.

Another alternative to consider would be using Drillix (Drill + Phoenix)
which uses Calcite underneath[1].

Thanks,
James

[1]
https://apurtell.s3.amazonaws.com/phoenix/Drillix+Combined+Operational+%26+Analytical+SQL+at+Scale.pdf

On Sat, Oct 22, 2016 at 1:02 PM, Eli Levine <elilevine@gmail.com> wrote:

> Greetings, Calcite devs. First of all, thank you for your work on Calcite!
>
> I am working on a federated query engine that will use Spark (or something
> similar) as the main execution engine. Among other data sources the query
> engine will read from Apache Phoenix tables/views. The hope is to utilize
> Calcite as the query planner and optimizer component of this query engine.
>
> At a high level, I am trying to build the following using Calcite:
> 1. Generate a relational algebra expression tree using RelBuilder based on
> user input. I plan to implement custom schema and table classes based on my
> metadata.
> 2. Provide Calcite with query optimization rules.
> 3. Traverse the optimized expression tree to generate a set of Spark
> instructions.
> 4. Execute query instructions via Spark.
>
> A few questions regarding the above:
> 1. Are there existing examples of code that does #3 above? I looked at the
> Spark submodule and it seems pretty bare-bones. What would be great to see
> is an example of a RelNode tree being traversed to create a plan for
> asynchronous execution via something like Spark or Pig.
> 2. An important query optimization that is planned initially is to be able
> to push down simple filters to Phoenix (the plan is to use Phoenix-Spark
> <http://phoenix.apache.org/phoenix_spark.html> integration for reading
> data). Any examples of such push-downs to specific data sources in a
> federated query scenario would be much appreciated.
>
> Thank you! Looking forward to working with the Calcite community.
>
> -------------
> Eli Levine
> Software Engineering Architect -- Salesforce.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message