hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stamatis Zampetakis (Jira)" <>
Subject [jira] [Commented] (HIVE-23976) Enable vectorization for multi-col semi join reducers
Date Fri, 02 Oct 2020 12:46:00 GMT


Stamatis Zampetakis commented on HIVE-23976:

Hi [~abstractdog],

While working on HIVE-24221, I got some further questions/ideas regarding this issue.

It seems that we make use of n-ary vectorized expressions for the evaluation of AND and OR
operators; its true it is not done with the descriptor but through {{VectorizationContext}}.
I am not sure what this mean in terms of efficiency, but it looks like we are saving at least
some memory since I get the impression that we can reuse the output vector and not have a
different output vector per pair of binary operations. We could employ something similar for
an n-ary hash function.

Assuming that we cannot/should not treat the hash as n-ary operator then I think it makes
more sense to make it unary (single input, single output), instead of binary, being only a
kind of wrapper around Murmur for the different datatypes. By doing this the implementation
will be simpler and we can cover more use-cases as the combine step is delegated to another

hash(a,b) = 31*murmur(a) + murmur(b)

hash(a) = murmur(a)

What do you think?

> Enable vectorization for multi-col semi join reducers
> -----------------------------------------------------
>                 Key: HIVE-23976
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Stamatis Zampetakis
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. However, the
implementation relies on GenericUDFMurmurHash which is not vectorized thus the respective
operators cannot be executed in vectorized mode. 

This message was sent by Atlassian Jira

View raw message