hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-20491) Fix mapjoin size estimations for Fast implementation
Date Wed, 05 Sep 2018 06:54:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604008#comment-16604008
] 

Ashutosh Chauhan commented on HIVE-20491:
-----------------------------------------

Ok. As it currently stands we always estimate assuming fast hashtable and always use it. 
What we will miss out on is if estimate is high we will turn off BJ altogether instead of
going with more memory efficient optimized version of hashtable. I agree we can take this
improvement in follow-up.
+1

> Fix mapjoin size estimations for Fast implementation
> ----------------------------------------------------
>
>                 Key: HIVE-20491
>                 URL: https://issues.apache.org/jira/browse/HIVE-20491
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-20491.01.patch, HIVE-20491.01wip02.patch, HIVE-20491.02.patch
>
>
> HIVE-19824 have fixed the estimations; but it calculated for the "optimized" impl; the
"fast" one has a little bit bigger footprint.
> It also seems like fast is a bit overestimated at runtime...that should be also taken
care of.
> | numkeys | implementation | compiler estimation | runtime estimation | runtime measurement
| ce / rm | re / rm |
> | 25M | FAST | 1168435456 | 2189433712 | 1513584984 | .77 | 1.44 |
> | 25M | OPTIMIZED | 1168435456 | 1191203764 | 1168439664 | 100% | 1.01 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message