calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danny Chan <yuzhao....@gmail.com>
Subject Re: Using indexes rather than table scans with Calcite
Date Fri, 29 May 2020 23:48:08 GMT
Calcite does support table hint now, it's syntax is Oracle style[1], i saw that many engines
support a INDEX hint to force a index scan on table[2] [3], maybe you can have a try also.

[1] https://calcite.apache.org/docs/reference.html#sql-hints
[2] https://docs.oracle.com/cd/B13789_01/server.101/b10752/hintsref.htm#5156
[3] https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table?view=sql-server-ver15

Best,
Danny Chan
在 2020年5月29日 +0800 PM4:44,Tim Fox <tfox@confluent.io>,写道:
> Hi,
>
> I'm building a query engine with Calcite - really enjoying working with
> Calcite so far!
>
> When creating a plan, it seems Calcite always creates a plan where the
> sources are table scans, however in my implementation the tables can have
> indexes on them so a table scan is not always the right choice.
>
> I was wondering if there was any way of making Calcite "index aware" - e.g.
> perhaps providing hints to the table scan instance that, actually, an index
> scan or a primary key lookup should be used instead of actually scanning
> the table. E.g. On the table meta-data if we provided information about any
> indexes on the table, then Calcite could figure out what parts of the query
> to push to the table scan and which to keep in the rest of the plan.
>
> There are two specific cases I really care about:
>
> 1. Queries that contain a primary key lookup:
>
> select * from some_table where key_column=23 AND some_other_column='foo';
>
> In the above case the 'select * from some_table where key_column=23' can be
> implemented as a simple PK lookup in the source table, not requiring a
> scan, thus leaving just the filter corresponding to
> 'some_other_column='foo'' in the rest of the plan
>
> 2. Queries with expressions on a column which has a secondary index
>
> select * from users where country='UK' and some_other_column='foo';
>
> We have many users, and let's say 10% of them are from UK (still a lot). We
> have a secondary index in the country column in the source table so we can
> do an efficient index scan to retrieve the matching records.
>
> I found this document
> https://calcite.apache.org/docs/materialized_views.html which seems like it
> might help me in some way.
>
> The idea being if I can think of my indexes as materialized views then the
> query can be written against those materialized views as sources instead of
> the original table sources. There appears to be a rule
> 'MaterializedViewRule' that does this already (?).
>
> This seems to get me a bit further, however, for this approach to work, it
> seems I would have to create materialized views _dynamically_ during
> evaluation of the query, register them, rewrite the query, execute it, then
> deregister the materialized view.
>
> E.g. for the primary key lookup example above, for the following query:
>
> select * from some_table where key_column=23 AND some_other_column='foo';
>
> I would need to dynamically create a materialized view corresponding to:
>
> select * from some_table where key_column=23
>
> Then rewrite the query using MaterializedViewRule.
>
> In the general case, in order to figure out what materialized views I need
> to dynamically create I would need to examine the query, figure out which
> columns in expressions have indexes on them and from them work out the best
> materialized view to create based on that information. This seems non
> trivial.
>
> Does anyone have any suggestions or pointers for how to implement this kind
> of thing? I suspect I'm not the first person to have tried to do this, as
> using indexes on tables seems a pretty common thing in many systems (?)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message