hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deepak Jaiswal (JIRA)" <j...@apache.org>
Subject [jira] [Work started] (HIVE-18200) Bucket Map Join : Use correct algorithm to pick the big table
Date Fri, 01 Dec 2017 21:38:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Work on HIVE-18200 started by Deepak Jaiswal.
---------------------------------------------
> Bucket Map Join : Use correct algorithm to pick the big table
> -------------------------------------------------------------
>
>                 Key: HIVE-18200
>                 URL: https://issues.apache.org/jira/browse/HIVE-18200
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Deepak Jaiswal
>            Assignee: Deepak Jaiswal
>
> Currently the algorithm to pick the big table is flawed due to complexity associated
with n-way joins.
> It could result in OOM, consider the following scenario,
> CREATE TABLE tab_part (key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY
(key) INTO 4 BUCKETS STORED AS TEXTFILE;
> CREATE TABLE tab(key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key)
INTO 2 BUCKETS STORED AS TEXTFILE;
> Lets say tab has size of 2GB and tab_part has size of 500MB and noconditionaltasksize
is 200MB, then bucket map join should not happen as atleast one hash table will be more than
250 MB, which may cause OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message