drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Psaltis <Andrew.Psal...@Webtrends.com>
Subject Re: First pass at a reference interpreter
Date Tue, 15 Jan 2013 00:07:33 GMT
This is great to see. Can't wait to start using and contributing.  

In good health,

Sent from my GPU powered iPhone

On Jan 14, 2013, at 16:56, "Jacques Nadeau" <jacques.drill@gmail.com> wrote:

> I've been pulling together a reference logical plan interpreter.  I'm
> working with Ted to get it inside the Drill sandbox.  For now, you can find
> it on my repo at https://github.com/jacques-n/incubator-drill (prototype
> branch)
> The goals of the reference interpreter are:
>   - To provide a simple way to run a Logical Plan against some sample data
>   and get back the expected result
>   - Allow work to start on the parsers while we scale up the performance
>   and capabilities of the execution engine and optimizer.
>   - Allow evaluation work on particular technical approaches such as
>   exploring the impact of hierarchical and schema less data on query
>   evaluation.
> These goals do not include performance, memory handling, or
> efficiency.  Currently,
> the interpreter is a single node/thread process.  This will change shortly
> so that it also run as a clustered process.
> The entry point is inside the /sandbox/prototype/exec/ref module:
> org.apache.drill.exec.ref.ReferenceInterpreter.main();  The example program
> utilizes two resources: simple-plan.json and donuts.json and outputs data
> to /opt/data/out.json.
> Some of things that 'work'.
>   - Read/write basic json.
>   - ROPs (reference operators): Filter, Transform, Group, Aggregate
>   (simple), Order, Union.
>   - Example aggregate and basic functions including sum, count, multiply,
>   add, compare, equals.
> Basic glossary/concepts (we'll get this on the wiki/javadocs):
>   - LOP: Logical Operator.  An implementation agnostic data flow operator
>   utilized by the Logical Plan.
>   - ROP: Reference Operator: A reference operator implementation that
>   pairs with a LOP.
>   - FunctionDefinition: A definition of a particular function.  Describes
>   a set of aliases, an allowable set of input arguments and an interface that
>   will attempt to determine output type.
>   - BasicEvaluator: An implementation of a particular non-aggregate
>   expression.  Receives a record pointer at creation time. Returns a
>   DataValue.
>   - AggregateEvaluator: An implementation of a particular aggregating
>   function.  Is provided a record pointer at creation time.  Expects regular
>   calls to addRecord() followed by a call to eval() which provides the
>   aggregate value.
>   - DataValue: A pointer to a particular data value.  Implementation
>   classes includes things like ScalarLong, ScalarBytes, SimpleMapValue and
>   SimpleArrayValue.
> The standard record iterator utilized between each ROP utilizes the
> org.apache.drill.exec.ref.RecordIterator interface.  This is somewhat
> inspired by the AttributeSource concepts from within the Lucene project.
> (I'm planning to extend these concepts all the way to the individual
> DataValues.)
> My next goals are to add tests, finish adding ROPs, add local and remote
> exchange nodes (parallelization), add a bunch of documentation and extract
> out the Execution plan as a separate intermediate representation.
> It needs a lot more evaluators to be a true reference interpreter (as well
> as the rest of the ROPs).  The existing ones can be utilized as prototypes.
> Anyone interested in ripping through a bunch of additional evaluators and
> associated FunctionDefinitions?

View raw message