hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10261) Data size can be underestimated when computed with partial column stats
Date Wed, 08 Apr 2015 16:03:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485439#comment-14485439
] 

Mostafa Mokhtar commented on HIVE-10261:
----------------------------------------

[~lirui]

Can you please attach an explain plan along with query and actual number of rows for the operator
with underestimation?

> Data size can be underestimated when computed with partial column stats
> -----------------------------------------------------------------------
>
>                 Key: HIVE-10261
>                 URL: https://issues.apache.org/jira/browse/HIVE-10261
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>
> With {{hive.stats.fetch.column.stats=true}}, we'll estimate data size with column  stats
when annotating operators with statistics. However, when column stats is partial, we're likely
to underestimate data size, which may hurt performance, e.g. picking an inappropriate small
table for map join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message