This is great to see. Can't wait to start using and contributing.
In good health,
Andrew
Sent from my GPU powered iPhone
On Jan 14, 2013, at 16:56, "Jacques Nadeau" <jacques.drill@gmail.com> wrote:
> I've been pulling together a reference logical plan interpreter. I'm
> working with Ted to get it inside the Drill sandbox. For now, you can find
> it on my repo at https://github.com/jacques-n/incubator-drill (prototype
> branch)
>
>
>
> The goals of the reference interpreter are:
>
>
> - To provide a simple way to run a Logical Plan against some sample data
> and get back the expected result
> - Allow work to start on the parsers while we scale up the performance
> and capabilities of the execution engine and optimizer.
> - Allow evaluation work on particular technical approaches such as
> exploring the impact of hierarchical and schema less data on query
> evaluation.
>
> These goals do not include performance, memory handling, or
> efficiency. Currently,
> the interpreter is a single node/thread process. This will change shortly
> so that it also run as a clustered process.
>
> The entry point is inside the /sandbox/prototype/exec/ref module:
> org.apache.drill.exec.ref.ReferenceInterpreter.main(); The example program
> utilizes two resources: simple-plan.json and donuts.json and outputs data
> to /opt/data/out.json.
>
>
> Some of things that 'work'.
>
>
> - Read/write basic json.
> - ROPs (reference operators): Filter, Transform, Group, Aggregate
> (simple), Order, Union.
> - Example aggregate and basic functions including sum, count, multiply,
> add, compare, equals.
>
> Basic glossary/concepts (we'll get this on the wiki/javadocs):
>
>
> - LOP: Logical Operator. An implementation agnostic data flow operator
> utilized by the Logical Plan.
> - ROP: Reference Operator: A reference operator implementation that
> pairs with a LOP.
> - FunctionDefinition: A definition of a particular function. Describes
> a set of aliases, an allowable set of input arguments and an interface that
> will attempt to determine output type.
> - BasicEvaluator: An implementation of a particular non-aggregate
> expression. Receives a record pointer at creation time. Returns a
> DataValue.
> - AggregateEvaluator: An implementation of a particular aggregating
> function. Is provided a record pointer at creation time. Expects regular
> calls to addRecord() followed by a call to eval() which provides the
> aggregate value.
> - DataValue: A pointer to a particular data value. Implementation
> classes includes things like ScalarLong, ScalarBytes, SimpleMapValue and
> SimpleArrayValue.
>
> The standard record iterator utilized between each ROP utilizes the
> org.apache.drill.exec.ref.RecordIterator interface. This is somewhat
> inspired by the AttributeSource concepts from within the Lucene project.
> (I'm planning to extend these concepts all the way to the individual
> DataValues.)
>
>
>
> My next goals are to add tests, finish adding ROPs, add local and remote
> exchange nodes (parallelization), add a bunch of documentation and extract
> out the Execution plan as a separate intermediate representation.
>
>
>
> It needs a lot more evaluators to be a true reference interpreter (as well
> as the rest of the ROPs). The existing ones can be utilized as prototypes.
> Anyone interested in ripping through a bunch of additional evaluators and
> associated FunctionDefinitions?
|