drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Yates <jesse.k.ya...@gmail.com>
Subject Re: rewriting table for joining logical partitions
Date Thu, 02 Jun 2016 04:50:45 GMT
I'm building a layer "on top"[1] that hides the details of accessing
underlying "partitions" (stored as individual tables) and picks the right
tables based on the query (time partitioned, so prune tables which won't
fulfill the query).

More concretely, lets take a set of files, each one fulfilling the role of
"table":
/drill
   /table1.json
   /table2.json
or in drill parlance, *dfs.`/drill/table1.json`* and *dfs.`/drill/table2.json`.
T*he query that I want to make externally is:* SELECT * FROM EVENTS *and I
want to translate that internally to

*SELECT * FROM dfs.`/drill/table1.json` AS t1 JOIN
dfs.`/drill/table2.json`AS t2 ON t1.id <http://t1.id> = t2.id
<http://t2.id>.*Executing that expanded (second) query from the top-level
java.sql.Connection executes fine, hence my thoughts about the sub-table
not actually being found.

Here is what I am trying to do in my table handler - the 'EVENTS' table (
gist <https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89>).

Does that help at all?

Thanks,
Jesse

[1] Where on top in this case includes components inside Drill.

On Wed, Jun 1, 2016 at 9:27 PM Jinfeng Ni <jinfengni99@gmail.com> wrote:

> I'm not sure if I understand your problem correctly. Are you trying to
> build some non-SQL interface on top of Drill, to join a set of dynamic
> tables? Can you give more concrete example?
>
> When Drill handles join over two dynamic tables,  except for * column
> query, the dynamic tables have a list of fields defined, since those
> fields are referred in the query (even though the planner does not
> know each field's type). Therefore, the join condition will never be
> =($1,$1); it would be resolved to reference to left/right tables's
> fields.
>
>
>
> On Wed, Jun 1, 2016 at 7:43 PM, Jesse Yates <jyates@apache.org> wrote:
> > Hi all,
> >
> > I'm trying to rewrite a query of a table (ala Table#toRel) to join a set
> of
> > dynamic (sub-)tables on a couple of known columns but am getting stuck
> > building the condition. The sub-tables are not part of the original
> query,
> > but rather selected on-the-fly at logical query time.
> >
> > I can't use UNION-ALL because, outside of a couple of known columns, the
> > remainder are completely dynamic.
> >
> > Using RelBuilder I can construct the join via a series of scans and then
> > joins on the known fields[1]. However, this only creates RelInputRefs
> which
> > are not at all associated with the current ref numbering because the the
> > sub-tables are not present in the original query. Thus, we get conditions
> > like:
> > (=($1,$1), which looks appears TRUE but actually should reference the
> > left/right tables' fields.
> >
> > I tried playing around with RelRangeRef and manually managing the field
> > offsets in query (similar to BlackBoard), but that call gets translated
> > into an actually ALWAYS-TRUE condition and also fails the
> > JoinUtil#checkCartesianJoin case.
> >
> > If I construct the query via standard SQL at the top level (using known
> > tables), everything works fine, I think because of the ref-numbering to
> > which I cannot get access in #toRel().
> >
> > Any thoughts on the right way to go about this?
> >
> > Thanks much,
> > Jesse Yates
> >
> >
> > [1] Actually, this meant digging into RelBuilder
> > <
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L309
> >
> > since dynamic tables require the field names to already be set and then
> > uses the column name from the list to get the field index
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message