calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Sitnikov <sitnikov.vladi...@gmail.com>
Subject Re: Filter push
Date Mon, 13 Oct 2014 20:59:52 GMT
>* ProjectableCursorableTable goes further, and allows Calcite to
>specify a list of projected fields and a list of filters. The cursor
>must implement the projects, but it can choose which filters it is
>able to implement.

I am against of such interfaces.
I would be happy to be proven wrong.

This looks like a rabbit hole: it is a powerful feature, however
1) It seems hard to make it fast
Effectively, it forces engine to interpret the whole thing since Calcite
won't know if some of the filters are implemented by the table or not.
We'll have to double-check if the list returned from "projectFilterScan" is
valid (e.g. it does not contain completely new filters).

2) It does not look to scale well: tomorrow you'll want
ProjectableCursorableIndexScanThenAccessTable once you realize some of the
filtering can be checked against just the index contents. E.g. range scan
of the key, then some fuzzy filter logic on the key itself, then table
access for the rest with some more filters.

3) I am not sure if those kind of interfaces would solve more complex
cases: complex RexNodes (e.g. RexOver over RexOver over Rex..).
Ideally, filters should be split to the ones that "can be implemented at
storage and the ones that can not". I guess this has to be in some rule and
"CursorableTable" is just a tiny bit. The logic to split the filters is not
yet automagically solved by Calcite.

>* CursorableTable is an optional interface that can be implemented by
>any Table that allows you to get the results directly, without code
>generation, and without creating a TableAccessRel or similar.

How is that better than AbstractQueryableTable?
There is no need to do code generation if you need just a table scan.
There is no need to create separate TableAccessRel either.

Here's the example:
table definition:
https://github.com/vlsi/optiq-mat-plugin/blob/master/mat-plugin/src/com/github/vlsi/mat/optiq/HeapSchema.java#L40
table implementation:
https://github.com/vlsi/optiq-mat-plugin/blob/master/mat-plugin/src/com/github/vlsi/mat/optiq/InstanceByClassTable.java#L27

>It returns a Cursor, which is similar to a JDBC ResultSet but much
>simpler to implement,

We might just want "cursor convention", however it is a separate issue
(e.g. getElementType -> Cursor.class | Object[].class |
CustomDefinedPOJO.class)
I do not like if "cursorable" would be a feature of "Cursorable" table.
This will confuse users since "different kind of tables will have subtle
differences and it would be impossible to pick the right one".

> and is more efficient than an Iterator or
>Enumerable.

Can you please elaborate why Cursor would be so much better?
I see nothing specific to Cursor that would make it more efficient.

The downside of Cursor is the requirement to convert the values to suit
each and every getter (30+ methods in Cursor$Accessor interface).
For instance, the data might be stored internally as "int", and Calcite
will use getString for some reason (who stops that?)
This might be not that efficient and it even might surprise the developer
who implements the Cursor.

I bet no one would be able to implement Date/Timestamp kind of fields from
the first and even the second try (especially getting all the getters
right).

‚Äč--
Vladimir Sitnikov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message