drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject updates to logical plan spec
Date Sat, 13 Oct 2012 02:37:39 GMT
I talked to Jason some more.  He had some very good suggestions.

a) some operators need to have multiple outputs.  For instance, the group
operator needs to output the main data stream and a reference to the
grouped field

b) what Julian was calling nest/unnest is more naturally called explode and
flatten.  The idea is that some field has a list-like value and the output
will be each of those values.  Actually, there are two outputs.  One is the
original input and the second is the explode sequence.  This can be the
input to a DAG which does whatever we want to that exploded sequence,
typically aggregating it, but really doing whatever we want.  Then the
flatten operator handles splicing the output of the sub-DAG into the
original record that had the list-like value.  There are two outputs of the
flatten operator as well, which are the main data flow and a reference to
the output of the DAG in the main data.

This style handles all of the normal grouping/aggregating type of things we
want to do and it also handles all of Dremel's within syntax.

I also realized a few things as well

1) the bind needs to be rooted in some data source so that we can
understand scoping relative to schemas

2) there is an important difference between two separate outputs of a DAG
element and a single output that goes two places.

3) everywhere I was wanting to inject an output field name can be handled
by multiple outputs

I think that the logical plan spec is ready for two things, both of which
can be done by somebody other than me:

A - We can now start trying to convert an abstract syntax tree from
Dremel-ish source into a logical plan

B - We can implement a toy interpreter for the logical plan that transforms
sequences of trees.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message