drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: updates to logical plan spec
Date Sat, 13 Oct 2012 13:29:33 GMT
On Sat, Oct 13, 2012 at 2:03 AM, Julian Hyde <julianhyde@gmail.com> wrote:

> I guess I've been thinking about something a little different. I've been
> thinking about what would be an appropriate algebra to internally
> represent, manipulate, and optimize Drill queries.

I don't think that we are far apart.

> My conclusion is that the best candidate is relational algebra, augmented
> with data types that allow collections of nested records and with "explode"
> and "implode" operators. ("explode" takes a record with nested records and
> converts it to a sequence of flat records, plus a "location" value that
> indicates how the nested record fits into the parent; "implode" is the
> inverse: it takes a sequence of flat records with a "location" field and
> converts into a nested record.)

This is very close to what I now have in the plan document.

> ...
> The algebra is (probably) higher level than what Ted calls a "logical
> plan". His operators produce two outputs, and while that makes perfect
> sense for physical operators, it is difficult to reason about.

The multiple outputs fall into two cases:

1) a secondary output that is simply a reference to a new value that
doesn't yet have a name.

2) explode provides a reference to the original data stream.  If you think
that flatten (aka implode) doesn't need that, then this isn't necessary.

I don't think that cases of (1) affect reasoning about the DAG and I think
that the unique case of (2) may be something we can eliminate.

To fix up DrQL's "where" operator, we convert it to "explode" followed by
> "filter" followed by "implode". To fix up aggregate "within", we apply
> "explode" then aggregate. We find that we never need to operate on trees.
> If we need to operate on a tree, we explode into several records, apply
> relational operators on those records, and if necessary implode back again.
> We're operating in relational algebra.

I think that this works with the current plan document.

> Am I over-engineering here? It's possible. Maybe Drill doesn't need query
> optimization. Maybe queries can go straight from a DrQL parse tree to a DAG
> of operators using a straightforward mapping.

If so, the query optimizer will be very simple.  But I am pretty sure that
getting delayed record assembly to work right will require reasoning such
as you suggest.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message