hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stamatis Zampetakis (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-23976) Enable vectorization for multi-col semi join reducers
Date Fri, 02 Oct 2020 12:46:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206145#comment-17206145
] 

Stamatis Zampetakis commented on HIVE-23976:
--------------------------------------------

Hi [~abstractdog],

While working on HIVE-24221, I got some further questions/ideas regarding this issue.

It seems that we make use of n-ary vectorized expressions for the evaluation of AND and OR
operators; its true it is not done with the descriptor but through {{VectorizationContext}}.
I am not sure what this mean in terms of efficiency, but it looks like we are saving at least
some memory since I get the impression that we can reuse the output vector and not have a
different output vector per pair of binary operations. We could employ something similar for
an n-ary hash function.

Assuming that we cannot/should not treat the hash as n-ary operator then I think it makes
more sense to make it unary (single input, single output), instead of binary, being only a
kind of wrapper around Murmur for the different datatypes. By doing this the implementation
will be simpler and we can cover more use-cases as the combine step is delegated to another
abstraction.

+Currently+ 
{noformat}
hash(a,b) = 31*murmur(a) + murmur(b)
{noformat}

+After+
{noformat}
hash(a) = murmur(a)
{noformat}

What do you think?

> Enable vectorization for multi-col semi join reducers
> -----------------------------------------------------
>
>                 Key: HIVE-23976
>                 URL: https://issues.apache.org/jira/browse/HIVE-23976
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Stamatis Zampetakis
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. However, the
implementation relies on GenericUDFMurmurHash which is not vectorized thus the respective
operators cannot be executed in vectorized mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message