calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <>
Subject Optimization of join between a small table and a large table
Date Wed, 07 Aug 2019 12:55:45 GMT

I'm currently working on a custom Calcite adapter, and I've got a situation
where I want to join a small table with a large table, basically just using
the small table to filter the large table. Conceptually, I'm talking about
something that is equivalent to the following query:

    from emp join dept on emp.deptid =
        where = 'Sales'

I've got converter rules which to push down filtering on all tables, which
make a big difference in performance. The above query current results in an
EnumerableHashJoin over a filtered scan over the 'dept' table, and an
unfiltered scan over the 'emp' table.

What I would like to accomplish is that this is converted into a (I think)
a nested loop join between 'dept' and emp, so that the filtered scan is
done once over 'dept', and then a filtered scan is done for each entry of
the 'dept' table using the 'id' value from that entry as a filter value on
the scan on the 'emp' table.

I've been able to get a nested loop join to be used, but I haven't managed
to have the 'id' values from the 'dept' table to be used to filter the
'emp' table. Instead, the full 'emp' table is scanned for each iteration of
the 'dept' table.

And now my questions: are my hopes/expectations here realistic, and/or is
there a much better/easier way of accomplishing the same thing? Is there
prior art somewhere within the Calcite code base or elsewhere? Or should
this just be working by default?

I would assume that this isn't that unusual of a situation, which is why I
was expecting that there would already be something like this somewhere (or
I'm doing the wrong thing), but I haven't managed to find any clear
pointers in any one direction yet.

Thanks in advance for any advice!

- Gabriel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message