drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Yates <jesse.k.ya...@gmail.com>
Subject Re: rewriting table for joining logical partitions
Date Thu, 02 Jun 2016 07:19:55 GMT
Its definitely getting closer, Thanks Jinfeng!

I end up with this set of plans
<https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-expanded-table-planning>.
However, the only column that is returned is * and it contains just the
joined id column, rather than the full column expansion.

This is in contrast to the raw file join physical plan
<https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-raw-sql-physical-plan>
which has an expanded projection condition.

I'm not sure how I can force the top projection to select correctly. Do i
inject it as a wrapper scan around the created join or create a new rule
that takes (project, join) and replaces it with the correct project
condition? Naturally, I'd have to type that rule/join to not just be the
logical so it can convert only this generated base case.

Thanks!
--Jesse

ps. Updated the gist
<https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-drill-to-rel-table-example-java>
to cover the offset changes for the degenerate 2 table case.

On Wed, Jun 1, 2016 at 10:26 PM Jinfeng Ni <jinfengni99@gmail.com> wrote:

> Seems to me that the reason you run into problem when build the join
> condition is that you are trying to compose the condition while adding
> the fields on-the-fly in [1]. This is different from what Drill is
> doing in Calcite library. For Drill, before construct the join
> condition (RexNode) t1.id = t2.id, SqlValidator will make sure those
> fields exists in the table.  For dynamic table, such check will lead
> to one additional field in table's rowType. After validation, when
> Calcite builds join condition in SqlToRelConverter, it already knows
> the field list for each input table (at that time, table's rowType
> should be immutable), and hence it can adjust the reference to the
> right table field, by adding the # of LHS fields to the index. That
> is, =($1, $1) would become =($1, $3).
>
> I feel you may follow such logic, by calling field(table1, fieldName)
> on each side of join, before construct the join condition.
>
>
> [1]
> https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-drill-to-rel-table-example-java-L94-L95
>
> On Wed, Jun 1, 2016 at 9:50 PM, Jesse Yates <jesse.k.yates@gmail.com>
> wrote:
> > I'm building a layer "on top"[1] that hides the details of accessing
> > underlying "partitions" (stored as individual tables) and picks the right
> > tables based on the query (time partitioned, so prune tables which won't
> > fulfill the query).
> >
> > More concretely, lets take a set of files, each one fulfilling the role
> of
> > "table":
> > /drill
> >    /table1.json
> >    /table2.json
> > or in drill parlance, *dfs.`/drill/table1.json`* and
> *dfs.`/drill/table2.json`.
> > T*he query that I want to make externally is:* SELECT * FROM EVENTS *and
> I
> > want to translate that internally to
> >
> > *SELECT * FROM dfs.`/drill/table1.json` AS t1 JOIN
> > dfs.`/drill/table2.json`AS t2 ON t1.id <http://t1.id> = t2.id
> > <http://t2.id>.*Executing that expanded (second) query from the
> top-level
> > java.sql.Connection executes fine, hence my thoughts about the sub-table
> > not actually being found.
> >
> > Here is what I am trying to do in my table handler - the 'EVENTS' table (
> > gist <https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89>).
> >
> > Does that help at all?
> >
> > Thanks,
> > Jesse
> >
> > [1] Where on top in this case includes components inside Drill.
> >
> > On Wed, Jun 1, 2016 at 9:27 PM Jinfeng Ni <jinfengni99@gmail.com> wrote:
> >
> >> I'm not sure if I understand your problem correctly. Are you trying to
> >> build some non-SQL interface on top of Drill, to join a set of dynamic
> >> tables? Can you give more concrete example?
> >>
> >> When Drill handles join over two dynamic tables,  except for * column
> >> query, the dynamic tables have a list of fields defined, since those
> >> fields are referred in the query (even though the planner does not
> >> know each field's type). Therefore, the join condition will never be
> >> =($1,$1); it would be resolved to reference to left/right tables's
> >> fields.
> >>
> >>
> >>
> >> On Wed, Jun 1, 2016 at 7:43 PM, Jesse Yates <jyates@apache.org> wrote:
> >> > Hi all,
> >> >
> >> > I'm trying to rewrite a query of a table (ala Table#toRel) to join a
> set
> >> of
> >> > dynamic (sub-)tables on a couple of known columns but am getting stuck
> >> > building the condition. The sub-tables are not part of the original
> >> query,
> >> > but rather selected on-the-fly at logical query time.
> >> >
> >> > I can't use UNION-ALL because, outside of a couple of known columns,
> the
> >> > remainder are completely dynamic.
> >> >
> >> > Using RelBuilder I can construct the join via a series of scans and
> then
> >> > joins on the known fields[1]. However, this only creates RelInputRefs
> >> which
> >> > are not at all associated with the current ref numbering because the
> the
> >> > sub-tables are not present in the original query. Thus, we get
> conditions
> >> > like:
> >> > (=($1,$1), which looks appears TRUE but actually should reference the
> >> > left/right tables' fields.
> >> >
> >> > I tried playing around with RelRangeRef and manually managing the
> field
> >> > offsets in query (similar to BlackBoard), but that call gets
> translated
> >> > into an actually ALWAYS-TRUE condition and also fails the
> >> > JoinUtil#checkCartesianJoin case.
> >> >
> >> > If I construct the query via standard SQL at the top level (using
> known
> >> > tables), everything works fine, I think because of the ref-numbering
> to
> >> > which I cannot get access in #toRel().
> >> >
> >> > Any thoughts on the right way to go about this?
> >> >
> >> > Thanks much,
> >> > Jesse Yates
> >> >
> >> >
> >> > [1] Actually, this meant digging into RelBuilder
> >> > <
> >>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L309
> >> >
> >> > since dynamic tables require the field names to already be set and
> then
> >> > uses the column name from the list to get the field index
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message