hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <>
Subject [jira] [Work logged] (HIVE-24221) Use vectorizable expression to combine multiple columns in semijoin bloom filters
Date Thu, 01 Oct 2020 22:39:00 GMT


ASF GitHub Bot logged work on HIVE-24221:

                Author: ASF GitHub Bot
            Created on: 01/Oct/20 22:38
            Start Date: 01/Oct/20 22:38
    Worklog Time Spent: 10m 
      Work Description: zabetak opened a new pull request #1544:

   ### What changes were proposed in this pull request?
   Use hash(hash(hash(a,b),c),d) instead of hash(a,b,c,d) when constructing
   the multi-col semijoin reducer.
   ### Why are the changes needed?
   In order to use fully vectorized execution on multi-col semijoin reducers.
   ### Does this PR introduce _any_ user-facing change?
   Only changes in EXPLAIN plans.
   ### How was this patch tested?
   `mvn test -Dtest=TestTezPerfCliDriver -Dqfile="query50.q"`

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

Issue Time Tracking

            Worklog Id:     (was: 493721)
    Remaining Estimate: 0h
            Time Spent: 10m

> Use vectorizable expression to combine multiple columns in semijoin bloom filters
> ---------------------------------------------------------------------------------
>                 Key: HIVE-24221
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>         Environment: 
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
> Currently, multi-column semijoin reducers use an n-ary call to GenericUDFMurmurHash to
combine multiple values into one, which is used as an entry to the bloom filter. However,
there are no vectorized operators that treat n-ary inputs. The same goes for the vectorized
implementation of GenericUDFMurmurHash introduced in HIVE-23976. 
> The goal of this issue is to choose an alternative way to combine multiple values into
one to pass in the bloom filter comprising only vectorized operators.

This message was sent by Atlassian Jira

View raw message