drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Chen" <tnac...@gmail.com>
Subject Re: Review Request 18347: Implement hash aggregation operator and a hash table (for Drill-335) and new templates for aggregate functions (Drill-318)
Date Sat, 01 Mar 2014 08:23:14 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18347/#review35899
-----------------------------------------------------------



exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/common/TestHashTable.java
<https://reviews.apache.org/r/18347/#comment66713>

    There is no single test for these changes?


- Timothy Chen


On Feb. 21, 2014, 8:23 p.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18347/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2014, 8:23 p.m.)
> 
> 
> Review request for drill, John Morris, Jacques Nadeau, Jason Altekruse, and Mehant Baid.
> 
> 
> Bugs: Drill-318, Drill-335 and and
>     https://issues.apache.org/jira/browse/Drill-318
>     https://issues.apache.org/jira/browse/Drill-335
>     https://issues.apache.org/jira/browse/and
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> This patch contains the following main components: 
> 1. Implementation of the hash aggregation execution operator - this has two main parts:
the HashAggTemplate and the HashAggBatch.  
> 2. Implementation of a hash table which is used by the hash aggregation.  The hash table
hash two main parts: the HashTableTemplate and the ChainedHashTable. The hash table internally
uses the notion of 'BatchHolder' to keep track of all keys that can fit within one batch of
64K values.  New BatcHolder objects are created as needed.  Each BatchHolder has its own vector
container.  The HashAggregate also has a similar structure and it keeps track of the workspace
variables.  
>  (NOTE: An initial design document for the hash aggregation and hash table was already
attached with Drill-335.  The document has not yet been updated with the latest implementation
... will try to do that in the near future).
> 3. Jinfeng's changes to use workspace vectors in the generated code for aggregate functions
(previously, for streaming aggregate we only needed to maintain workspace variable for 1 running
group; however for hash aggregate we need to maintain it for all groups).  
> 4. Fix for Drill-318: because of #3 above, the previous fix for Drill-318 is not valid
anymore.  I modified the template generation code for the aggregate functions such that they
conform to the new infrastructure.  
> 5. The original AggTemplate, AggBatch and Aggregator classes have been moved to corresponding
StreamingAggTemplate, StreamingAggBatch and StreamingAggregator in order to differentiate
it from hash aggregation.  These appear as new files but the code there has not changed. 

> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/drill/common/expression/fn/AggregationFunctions.java
5b46b78 
>   exec/java-exec/src/main/codegen/config.fmpp cd2b2cc 
>   exec/java-exec/src/main/codegen/data/AggrTypes1.tdd PRE-CREATION 
>   exec/java-exec/src/main/codegen/data/AggrTypes2.tdd PRE-CREATION 
>   exec/java-exec/src/main/codegen/templates/AggrTypeFunctions1.java PRE-CREATION 
>   exec/java-exec/src/main/codegen/templates/AggrTypeFunctions2.java PRE-CREATION 
>   exec/java-exec/src/main/codegen/templates/TypeHelper.java 783f943 
>   exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/GeneratorMapping.java
f99150e 
>   exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/MappingSet.java 1cb2380

>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java e674eab

>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillAggFuncHolder.java
7622865 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillFuncHolder.java dcdf1ea

>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java befa9bf

>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/CountFunctions.java
b0939f1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/SumFunctions.java
57905fe 
>   exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/BitFunctions.java 41bb7c6

>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java fd965a7

>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 6e58ec7

>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/OperatorCost.java 0bade41

>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java
ec7244a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java
2e2d7fd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatch.java
86fea4e 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatchCreator.java
36344f9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggTemplate.java
a8f39e3 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/Aggregator.java
2683045 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatchCreator.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggregator.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatchCreator.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggregator.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
3b8154e 
>   exec/java-exec/src/test/java/org/apache/drill/exec/expr/ExpressionTest.java 0de64b4

>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/SimpleRootExec.java
ccf7758 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggr.java
PRE-CREATION 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/common/TestHashTable.java
PRE-CREATION 
>   exec/java-exec/src/test/resources/common/test_hashtable1.json PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18347/diff/
> 
> 
> Testing
> -------
> 
> I have run several tests manually as part of TestHashAggr...these tests use TPC-H data
and in particular a relatively large 'Orders' table.  However, I have not yet packaged the
tests to run as part of JUnit since the location and size of the parquet files needs to be
figured out. I will continue to work on that.  
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message