ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Kondakov <kondako...@mail.ru.INVALID>
Subject Re: New SQL execution engine
Date Fri, 27 Sep 2019 09:34:43 GMT
Hi Igor!

In my opinion using Apache Calcite for distributed SQL query 
optimization and planning is much more promising approach than using H2. 
H2 is not suitable for distributed query execution and also it has very 
limited abilities for query optimization. While Apache Calcite is the 
open source implementation of Cascade/Volcano query optimization 
framework [1,2] (other implementations: MS SQL Server, Greenplum). The 
main advantage of this framework is it's extensibility - we can change 
the optimizer behavior by simply adding or removing optimization rules 
to it. Calcite has a cost based optimizer as well as heuristic one which 
can be useful in some situations.

The main challenges I see here:

1. Implementing the distributed query planning for Apache Calcite (it is 
was primarily developed for the single-node query optimization). We can 
reuse the solution of Apache Drill [3] guys here.

2. We need to implement a new distributed query execution engine. Apache 
Calcite is a query planning framework, but not the execution one, 
besidesĀ  it has some abilities for executing queries in the single-node 
case.

3. Secondary indexes are not supported by Calcite, so we need to 
overcome this problem somehow. AFAIK Apache Phoenix [4] guys implemented 
support of the secondary indexes as a sorted materialized views.

4. Apache Calcite is a cost-based optimizer - so we need to create our 
own cost model and gather statistics to be able to choose the most 
effective query execution plans.

5. What about deprecating our current query API which has a number of 
drawbacks like using shortcuts `List<?>' as a query result or multiple 
redundant flags in `SqlFieldsQuery` (collocated, lazy, etc) which are 
useless for the new query execution engine?

[1] 
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf
[2] 
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
[3] https://drill.apache.org/
[4] https://phoenix.apache.org/
-- 
Kind Regards
Roman Kondakov

On 27.09.2019 11:44, Igor Seliverstov wrote:
> Hi Igniters!
>
> As you might know currently we have many open issues relating to current H2 based engine
and its execution flow.
>
> Some of them are critical (like impossibility to execute particular queries), some of
them are majors (like impossibility to execute particular queries without pre-preparation
your data to have a collocation) and many minors.
>
> Most of the issues cannot be solved without whole engine redesign.
>
> So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>
> I'll appreciate if you share your thoughts on top of that.
>
> Regards,
> Igor

Mime
View raw message