hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <>
Subject [jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez
Date Wed, 01 Jul 2015 01:46:04 GMT


Jason Dere updated HIVE-10673:
    Attachment: HIVE-10673.6.patch

patch v6 - review feedback from Vikram

> Dynamically partitioned hash join for Tez
> -----------------------------------------
>                 Key: HIVE-10673
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Planning, Query Processor
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>         Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch,
HIVE-10673.5.patch, HIVE-10673.6.patch
> Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 2/3 of the
CPU was spent during sorting/merging.
> While this does not work for MR, for other execution engines (such as Tez), it is possible
to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting,
which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join
algorithm to perform the join in the reducer. This will require the small tables in the join
to fit in the reducer/hash table for this to work.

This message was sent by Atlassian JIRA

View raw message