-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18347/
-----------------------------------------------------------
(Updated March 6, 2014, 2:10 a.m.)
Review request for drill, Jacques Nadeau, Jason Altekruse, Mehant Baid, and Steven Phillips.
Changes
-------
This second patch is a cumulative patch that includes the previous patch plus the following
changes/enhancements:
1. It includes support for the code generation of the probe side of a hash join (previously,
we only needed the 'build' side for the hash aggregate)
2. Fix for a index out of bound error that I encountered while doing testing - this was related
to using the wrong TypedFieldId when accessing the hash table's container. This is resolved
by doing the creation of htContainer as part of ChainedHashTable and making a clone of it
in HashTableTemplate.
3. Address a review comment: Allocate the links, hashValues and startIndices using our vector
allocators (off-heap) instead of on the heap.
Bugs: Drill-318, Drill-335 and and
https://issues.apache.org/jira/browse/Drill-318
https://issues.apache.org/jira/browse/Drill-335
https://issues.apache.org/jira/browse/and
Repository: drill-git
Description
-------
This patch contains the following main components:
1. Implementation of the hash aggregation execution operator - this has two main parts: the
HashAggTemplate and the HashAggBatch.
2. Implementation of a hash table which is used by the hash aggregation. The hash table hash
two main parts: the HashTableTemplate and the ChainedHashTable. The hash table internally
uses the notion of 'BatchHolder' to keep track of all keys that can fit within one batch of
64K values. New BatcHolder objects are created as needed. Each BatchHolder has its own vector
container. The HashAggregate also has a similar structure and it keeps track of the workspace
variables.
(NOTE: An initial design document for the hash aggregation and hash table was already attached
with Drill-335. The document has not yet been updated with the latest implementation ...
will try to do that in the near future).
3. Jinfeng's changes to use workspace vectors in the generated code for aggregate functions
(previously, for streaming aggregate we only needed to maintain workspace variable for 1 running
group; however for hash aggregate we need to maintain it for all groups).
4. Fix for Drill-318: because of #3 above, the previous fix for Drill-318 is not valid anymore.
I modified the template generation code for the aggregate functions such that they conform
to the new infrastructure.
5. The original AggTemplate, AggBatch and Aggregator classes have been moved to corresponding
StreamingAggTemplate, StreamingAggBatch and StreamingAggregator in order to differentiate
it from hash aggregation. These appear as new files but the code there has not changed.
Diffs (updated)
-----
common/src/main/java/org/apache/drill/common/expression/fn/AggregationFunctions.java 5b46b78
exec/java-exec/src/main/codegen/config.fmpp 8f1060a
exec/java-exec/src/main/codegen/data/AggrTypes1.tdd PRE-CREATION
exec/java-exec/src/main/codegen/data/AggrTypes2.tdd PRE-CREATION
exec/java-exec/src/main/codegen/templates/AggrTypeFunctions1.java PRE-CREATION
exec/java-exec/src/main/codegen/templates/AggrTypeFunctions2.java PRE-CREATION
exec/java-exec/src/main/codegen/templates/TypeHelper.java 8c72cf7
exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/GeneratorMapping.java f99150e
exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/MappingSet.java 1cb2380
exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java e674eab
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillAggFuncHolder.java 7622865
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillFuncHolder.java dcdf1ea
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java befa9bf
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/CountFunctions.java
b0939f1
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/SumFunctions.java 57905fe
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/BitFunctions.java 41bb7c6
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 8462622
exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java aed4802
exec/java-exec/src/main/java/org/apache/drill/exec/physical/OperatorCost.java 0bade41
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java
ec7244a
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java 2e2d7fd
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatch.java
86fea4e
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatchCreator.java
36344f9
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggTemplate.java
a8f39e3
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/Aggregator.java
2683045
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatchCreator.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggregator.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatchCreator.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggregator.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
509d13b
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 6e764a8
exec/java-exec/src/test/java/org/apache/drill/exec/expr/ExpressionTest.java 0de64b4
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/SimpleRootExec.java ccf7758
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSimpleFunctions.java
f6a8096
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggr.java PRE-CREATION
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/common/TestHashTable.java
PRE-CREATION
exec/java-exec/src/test/resources/common/test_hashtable1.json PRE-CREATION
Diff: https://reviews.apache.org/r/18347/diff/
Testing
-------
I have run several tests manually as part of TestHashAggr...these tests use TPC-H data and
in particular a relatively large 'Orders' table. However, I have not yet packaged the tests
to run as part of JUnit since the location and size of the parquet files needs to be figured
out. I will continue to work on that.
Thanks,
Aman Sinha
|