drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacques.dr...@gmail.com>
Subject First pass at a reference interpreter
Date Mon, 14 Jan 2013 23:56:19 GMT
I've been pulling together a reference logical plan interpreter.  I'm
working with Ted to get it inside the Drill sandbox.  For now, you can find
it on my repo at https://github.com/jacques-n/incubator-drill (prototype
branch)



The goals of the reference interpreter are:


   - To provide a simple way to run a Logical Plan against some sample data
   and get back the expected result
   - Allow work to start on the parsers while we scale up the performance
   and capabilities of the execution engine and optimizer.
   - Allow evaluation work on particular technical approaches such as
   exploring the impact of hierarchical and schema less data on query
   evaluation.

These goals do not include performance, memory handling, or
efficiency.  Currently,
the interpreter is a single node/thread process.  This will change shortly
so that it also run as a clustered process.

The entry point is inside the /sandbox/prototype/exec/ref module:
org.apache.drill.exec.ref.ReferenceInterpreter.main();  The example program
utilizes two resources: simple-plan.json and donuts.json and outputs data
to /opt/data/out.json.


Some of things that 'work'.


   - Read/write basic json.
   - ROPs (reference operators): Filter, Transform, Group, Aggregate
   (simple), Order, Union.
   - Example aggregate and basic functions including sum, count, multiply,
   add, compare, equals.

Basic glossary/concepts (we'll get this on the wiki/javadocs):


   - LOP: Logical Operator.  An implementation agnostic data flow operator
   utilized by the Logical Plan.
   - ROP: Reference Operator: A reference operator implementation that
   pairs with a LOP.
   - FunctionDefinition: A definition of a particular function.  Describes
   a set of aliases, an allowable set of input arguments and an interface that
   will attempt to determine output type.
   - BasicEvaluator: An implementation of a particular non-aggregate
   expression.  Receives a record pointer at creation time. Returns a
   DataValue.
   - AggregateEvaluator: An implementation of a particular aggregating
   function.  Is provided a record pointer at creation time.  Expects regular
   calls to addRecord() followed by a call to eval() which provides the
   aggregate value.
   - DataValue: A pointer to a particular data value.  Implementation
   classes includes things like ScalarLong, ScalarBytes, SimpleMapValue and
   SimpleArrayValue.

The standard record iterator utilized between each ROP utilizes the
org.apache.drill.exec.ref.RecordIterator interface.  This is somewhat
inspired by the AttributeSource concepts from within the Lucene project.
 (I'm planning to extend these concepts all the way to the individual
DataValues.)



My next goals are to add tests, finish adding ROPs, add local and remote
exchange nodes (parallelization), add a bunch of documentation and extract
out the Execution plan as a separate intermediate representation.



It needs a lot more evaluators to be a true reference interpreter (as well
as the rest of the ROPs).  The existing ones can be utilized as prototypes.
 Anyone interested in ripping through a bunch of additional evaluators and
associated FunctionDefinitions?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message