drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: rewriting table for joining logical partitions
Date Thu, 02 Jun 2016 05:26:51 GMT
Seems to me that the reason you run into problem when build the join
condition is that you are trying to compose the condition while adding
the fields on-the-fly in [1]. This is different from what Drill is
doing in Calcite library. For Drill, before construct the join
condition (RexNode) t1.id = t2.id, SqlValidator will make sure those
fields exists in the table.  For dynamic table, such check will lead
to one additional field in table's rowType. After validation, when
Calcite builds join condition in SqlToRelConverter, it already knows
the field list for each input table (at that time, table's rowType
should be immutable), and hence it can adjust the reference to the
right table field, by adding the # of LHS fields to the index. That
is, =($1, $1) would become =($1, $3).

I feel you may follow such logic, by calling field(table1, fieldName)
on each side of join, before construct the join condition.

[1] https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-drill-to-rel-table-example-java-L94-L95

On Wed, Jun 1, 2016 at 9:50 PM, Jesse Yates <jesse.k.yates@gmail.com> wrote:
> I'm building a layer "on top"[1] that hides the details of accessing
> underlying "partitions" (stored as individual tables) and picks the right
> tables based on the query (time partitioned, so prune tables which won't
> fulfill the query).
> More concretely, lets take a set of files, each one fulfilling the role of
> "table":
> /drill
>    /table1.json
>    /table2.json
> or in drill parlance, *dfs.`/drill/table1.json`* and *dfs.`/drill/table2.json`.
> T*he query that I want to make externally is:* SELECT * FROM EVENTS *and I
> want to translate that internally to
> *SELECT * FROM dfs.`/drill/table1.json` AS t1 JOIN
> dfs.`/drill/table2.json`AS t2 ON t1.id <http://t1.id> = t2.id
> <http://t2.id>.*Executing that expanded (second) query from the top-level
> java.sql.Connection executes fine, hence my thoughts about the sub-table
> not actually being found.
> Here is what I am trying to do in my table handler - the 'EVENTS' table (
> gist <https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89>).
> Does that help at all?
> Thanks,
> Jesse
> [1] Where on top in this case includes components inside Drill.
> On Wed, Jun 1, 2016 at 9:27 PM Jinfeng Ni <jinfengni99@gmail.com> wrote:
>> I'm not sure if I understand your problem correctly. Are you trying to
>> build some non-SQL interface on top of Drill, to join a set of dynamic
>> tables? Can you give more concrete example?
>> When Drill handles join over two dynamic tables,  except for * column
>> query, the dynamic tables have a list of fields defined, since those
>> fields are referred in the query (even though the planner does not
>> know each field's type). Therefore, the join condition will never be
>> =($1,$1); it would be resolved to reference to left/right tables's
>> fields.
>> On Wed, Jun 1, 2016 at 7:43 PM, Jesse Yates <jyates@apache.org> wrote:
>> > Hi all,
>> >
>> > I'm trying to rewrite a query of a table (ala Table#toRel) to join a set
>> of
>> > dynamic (sub-)tables on a couple of known columns but am getting stuck
>> > building the condition. The sub-tables are not part of the original
>> query,
>> > but rather selected on-the-fly at logical query time.
>> >
>> > I can't use UNION-ALL because, outside of a couple of known columns, the
>> > remainder are completely dynamic.
>> >
>> > Using RelBuilder I can construct the join via a series of scans and then
>> > joins on the known fields[1]. However, this only creates RelInputRefs
>> which
>> > are not at all associated with the current ref numbering because the the
>> > sub-tables are not present in the original query. Thus, we get conditions
>> > like:
>> > (=($1,$1), which looks appears TRUE but actually should reference the
>> > left/right tables' fields.
>> >
>> > I tried playing around with RelRangeRef and manually managing the field
>> > offsets in query (similar to BlackBoard), but that call gets translated
>> > into an actually ALWAYS-TRUE condition and also fails the
>> > JoinUtil#checkCartesianJoin case.
>> >
>> > If I construct the query via standard SQL at the top level (using known
>> > tables), everything works fine, I think because of the ref-numbering to
>> > which I cannot get access in #toRel().
>> >
>> > Any thoughts on the right way to go about this?
>> >
>> > Thanks much,
>> > Jesse Yates
>> >
>> >
>> > [1] Actually, this meant digging into RelBuilder
>> > <
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L309
>> >
>> > since dynamic tables require the field names to already be set and then
>> > uses the column name from the list to get the field index

View raw message