hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <>
Subject [jira] [Work logged] (HIVE-24221) Use vectorizable expression to combine multiple columns in semijoin bloom filters
Date Wed, 14 Oct 2020 08:11:00 GMT


ASF GitHub Bot logged work on HIVE-24221:

                Author: ASF GitHub Bot
            Created on: 14/Oct/20 08:10
            Start Date: 14/Oct/20 08:10
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #1544:

File path: ql/src/java/org/apache/hadoop/hive/ql/plan/
@@ -233,6 +235,23 @@ public static ExprNodeGenericFuncDesc and(List<ExprNodeDesc> exps)
     return new ExprNodeGenericFuncDesc(TypeInfoFactory.booleanTypeInfo, new GenericUDFOPAnd(),
"and", flatExps);
+  /**
+   * Create an expression for computing a hash by recursively hashing given expressions by
+   * <pre>
+   * Input: HASH(A, B, C, D)
+   * Output: HASH(HASH(HASH(A,B),C),D)
+   * </pre>
+   */
+  public static ExprNodeGenericFuncDesc hash(List<ExprNodeDesc> exps) {
+    assert exps.size() >= 2;
+    ExprNodeDesc hashExp = exps.get(0);
+    for (int i = 1; i < exps.size(); i++) {
+      List<ExprNodeDesc> hArgs = Arrays.asList(hashExp, exps.get(i));
+      hashExp = new ExprNodeGenericFuncDesc(TypeInfoFactory.intTypeInfo, new GenericUDFMurmurHash(),
"hash", hArgs);

Review comment:
       it seems like we have some inconsistency in `GenericUDFMurmurHash` which is registered
as `murmur_hash` in the `FunctionRegistry` ; however in the UDF's annotation it only has `hash`
- and here as well we use simply "hash".
   a change like this will most likely cause a lot of q.out changes - could you file a follow-up

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 500527)
    Time Spent: 20m  (was: 10m)

> Use vectorizable expression to combine multiple columns in semijoin bloom filters
> ---------------------------------------------------------------------------------
>                 Key: HIVE-24221
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>         Environment: 
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
> Currently, multi-column semijoin reducers use an n-ary call to GenericUDFMurmurHash to
combine multiple values into one, which is used as an entry to the bloom filter. However,
there are no vectorized operators that treat n-ary inputs. The same goes for the vectorized
implementation of GenericUDFMurmurHash introduced in HIVE-23976. 
> The goal of this issue is to choose an alternative way to combine multiple values into
one to pass in the bloom filter comprising only vectorized operators.

This message was sent by Atlassian Jira

View raw message