drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha" <asi...@maprtech.com>
Subject Re: Review Request 18347: Implement hash aggregation operator and a hash table (for Drill-335) and new templates for aggregate functions (Drill-318)
Date Thu, 06 Mar 2014 02:10:45 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18347/
-----------------------------------------------------------

(Updated March 6, 2014, 2:10 a.m.)


Review request for drill, Jacques Nadeau, Jason Altekruse, Mehant Baid, and Steven Phillips.


Changes
-------

This second patch is a cumulative patch that includes the previous patch plus the following
changes/enhancements:

1. It includes support for the code generation of the probe side of a hash join (previously,
we only needed the 'build' side for the hash aggregate)
2. Fix for a index out of bound error that I encountered while doing testing - this was related
to using the wrong TypedFieldId when accessing the hash table's container. This is resolved
by doing the creation of htContainer as part of ChainedHashTable and making a clone of it
in HashTableTemplate. 
3. Address a review comment: Allocate the links, hashValues and startIndices using our vector
allocators (off-heap) instead of on the heap.   


Bugs: Drill-318, Drill-335 and and
    https://issues.apache.org/jira/browse/Drill-318
    https://issues.apache.org/jira/browse/Drill-335
    https://issues.apache.org/jira/browse/and


Repository: drill-git


Description
-------

This patch contains the following main components: 
1. Implementation of the hash aggregation execution operator - this has two main parts: the
HashAggTemplate and the HashAggBatch.  
2. Implementation of a hash table which is used by the hash aggregation.  The hash table hash
two main parts: the HashTableTemplate and the ChainedHashTable. The hash table internally
uses the notion of 'BatchHolder' to keep track of all keys that can fit within one batch of
64K values.  New BatcHolder objects are created as needed.  Each BatchHolder has its own vector
container.  The HashAggregate also has a similar structure and it keeps track of the workspace
variables.  
 (NOTE: An initial design document for the hash aggregation and hash table was already attached
with Drill-335.  The document has not yet been updated with the latest implementation ...
will try to do that in the near future).
3. Jinfeng's changes to use workspace vectors in the generated code for aggregate functions
(previously, for streaming aggregate we only needed to maintain workspace variable for 1 running
group; however for hash aggregate we need to maintain it for all groups).  
4. Fix for Drill-318: because of #3 above, the previous fix for Drill-318 is not valid anymore.
 I modified the template generation code for the aggregate functions such that they conform
to the new infrastructure.  
5. The original AggTemplate, AggBatch and Aggregator classes have been moved to corresponding
StreamingAggTemplate, StreamingAggBatch and StreamingAggregator in order to differentiate
it from hash aggregation.  These appear as new files but the code there has not changed. 



Diffs (updated)
-----

  common/src/main/java/org/apache/drill/common/expression/fn/AggregationFunctions.java 5b46b78

  exec/java-exec/src/main/codegen/config.fmpp 8f1060a 
  exec/java-exec/src/main/codegen/data/AggrTypes1.tdd PRE-CREATION 
  exec/java-exec/src/main/codegen/data/AggrTypes2.tdd PRE-CREATION 
  exec/java-exec/src/main/codegen/templates/AggrTypeFunctions1.java PRE-CREATION 
  exec/java-exec/src/main/codegen/templates/AggrTypeFunctions2.java PRE-CREATION 
  exec/java-exec/src/main/codegen/templates/TypeHelper.java 8c72cf7 
  exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/GeneratorMapping.java f99150e

  exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/MappingSet.java 1cb2380 
  exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java e674eab 
  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillAggFuncHolder.java 7622865

  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillFuncHolder.java dcdf1ea

  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java befa9bf

  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/CountFunctions.java
b0939f1 
  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/SumFunctions.java 57905fe

  exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/BitFunctions.java 41bb7c6

  exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 8462622 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java aed4802 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/OperatorCost.java 0bade41 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java
ec7244a 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java 2e2d7fd

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java PRE-CREATION

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatch.java
86fea4e 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatchCreator.java
36344f9 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggTemplate.java
a8f39e3 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/Aggregator.java
2683045 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatchCreator.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggregator.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatchCreator.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggregator.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java PRE-CREATION

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
509d13b 
  exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 6e764a8 
  exec/java-exec/src/test/java/org/apache/drill/exec/expr/ExpressionTest.java 0de64b4 
  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/SimpleRootExec.java ccf7758

  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSimpleFunctions.java
f6a8096 
  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggr.java PRE-CREATION

  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/common/TestHashTable.java
PRE-CREATION 
  exec/java-exec/src/test/resources/common/test_hashtable1.json PRE-CREATION 

Diff: https://reviews.apache.org/r/18347/diff/


Testing
-------

I have run several tests manually as part of TestHashAggr...these tests use TPC-H data and
in particular a relatively large 'Orders' table.  However, I have not yet packaged the tests
to run as part of JUnit since the location and size of the parquet files needs to be figured
out. I will continue to work on that.  


Thanks,

Aman Sinha


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message