hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-15122) Hive: Upcasting types should not obscure stats (min/max/ndv)
Date Fri, 16 Dec 2016 12:45:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754322#comment-15754322
] 

Jesus Camacho Rodriguez edited comment on HIVE-15122 at 12/16/16 12:45 PM:
---------------------------------------------------------------------------

[~ashutoshc], could you review this patch?

For new test case, PK-FK inference can be checked in the logs.
For that particular case, observe the different row count, which is the same as the one in
the query without cast.
Stats without patch:
{code}
Statistics: Num rows: 889 Data size: 7112 Basic stats: COMPLETE Column stats: COMPLETE
{code}
While stats with patch:
{code}
Statistics: Num rows: 964 Data size: 7712 Basic stats: COMPLETE Column stats: COMPLETE
{code}


was (Author: jcamachorodriguez):
[~ashutoshc], could you review this patch?

For new test case, PK-FK inference can be checked in the logs.
For that particular case, stats without patch:
{code}
Statistics: Num rows: 889 Data size: 7112 Basic stats: COMPLETE Column stats: COMPLETE
{code}
While stats with patch:
{code}
Statistics: Num rows: 964 Data size: 7712 Basic stats: COMPLETE Column stats: COMPLETE
{code}

> Hive: Upcasting types should not obscure stats (min/max/ndv)
> ------------------------------------------------------------
>
>                 Key: HIVE-15122
>                 URL: https://issues.apache.org/jira/browse/HIVE-15122
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-15122.01.patch, HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS:                                                                      
                                                                                      |
> |   Stage: Stage-1                                                                  
                                                                                      |
> |     Tez                                                                           
                                                                                      |
> |       DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6           
                                                                                      |
> |       Edges:                                                                      
                                                                                      |
> |         Map 2 <- Map 1 (BROADCAST_EDGE)                                        
                                                                                         |
> |         Map 3 <- Map 2 (BROADCAST_EDGE)                                        
                                                                                         |
> |         Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7 (CUSTOM_SIMPLE_EDGE), Map
8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)                                              |
> |         Reducer 5 <- Reducer 4 (SIMPLE_EDGE)                                   
                                                                                         |
> |         Reducer 6 <- Reducer 5 (SIMPLE_EDGE)                                   
                                                                                         |
> |       DagName:                                                                    
                                                                                      |
> |       Vertices:                                                                   
                                                                                      |
> |         Map 1                                                                     
                                                                                      |
> |             Map Operator Tree:                                                    
                                                                                      |
> |                 TableScan                                                         
                                                                                      |
> |                   alias: supplier                                                 
                                                                                      |
> |                   filterExpr: (s_suppkey is not null and s_nationkey is not null) (type:
boolean)                                                                        |
> |                   Statistics: Num rows: 10000000 Data size: 160000000 Basic stats:
COMPLETE Column stats: COMPLETE                                                       |
> |                   Filter Operator                                                 
                                                                                      |
> |                     predicate: (s_suppkey is not null and s_nationkey is not null)
(type: boolean)                                                                       |
> |                     Statistics: Num rows: 10000000 Data size: 160000000 Basic stats:
COMPLETE Column stats: COMPLETE                                                     |
> |                     Select Operator                                               
                                                                                      |
> |                       expressions: s_suppkey (type: bigint), s_nationkey (type: bigint)
                                                                                 |
> |                       outputColumnNames: _col0, _col1                             
                                                                                      |
> |                       Statistics: Num rows: 10000000 Data size: 160000000 Basic stats:
COMPLETE Column stats: COMPLETE                                                   |
> |                       Reduce Output Operator                                      
                                                                                      |
> |                         key expressions: _col0 (type: bigint)                     
                                                                                      |
> |                         sort order: +                                             
                                                                                      |
> |                         Map-reduce partition columns: _col0 (type: bigint)        
                                                                                      |
> |                         Statistics: Num rows: 10000000 Data size: 160000000 Basic stats:
COMPLETE Column stats: COMPLETE                                                 |
> |                         value expressions: _col1 (type: bigint)                   
                                                                                      |
> |             Execution mode: vectorized, llap                                      
                                                                                      |
> |             LLAP IO: all inputs                                                   
                                                                                      |
> |         Map 2                                                                     
                                                                                      |
> |             Map Operator Tree:                                                    
                                                                                      |
> |                 TableScan                                                         
                                                                                      |
> |                   alias: lineitem                                                 
                                                                                      |
> |                   filterExpr: (l_suppkey is not null and l_orderkey is not null) (type:
boolean)                                                                         |
> |                   Statistics: Num rows: 2285121364 Data size: 63983407882 Basic stats:
COMPLETE Column stats: PARTIAL                                                    |
> |                   Filter Operator                                                 
                                                                                      |
> |                     predicate: (l_suppkey is not null and l_orderkey is not null) (type:
boolean)                                                                        |
> |                     Statistics: Num rows: 2285121364 Data size: 127966796384 Basic
stats: COMPLETE Column stats: PARTIAL                                                 |
> |                     Select Operator                                               
                                                                                      |
> |                       expressions: l_orderkey (type: bigint), l_suppkey (type: int),
l_extendedprice (type: double), l_discount (type: double), l_shipdate (type: date)  |
> |                       outputColumnNames: _col0, _col1, _col2, _col3, _col4        
                                                                                      |
> |                       Statistics: Num rows: 2285121364 Data size: 127966796384 Basic
stats: COMPLETE Column stats: PARTIAL                                               |
> |                       Map Join Operator                                           
                                                                                      |
> |                         condition map:                                            
                                                                                      |
> |                              Inner Join 0 to 1                                    
                                                                                      |
> |                         keys:                                                     
                                                                                      |
> |                           0 _col0 (type: bigint)                                  
                                                                                      |
> |                           1 UDFToLong(_col1) (type: bigint)                       
                                                                                      |
> |                         outputColumnNames: _col1, _col2, _col4, _col5, _col6      
                                                                                      |
> |                         input vertices:                                           
                                                                                      |
> |                           0 Map 1                                                 
                                                                                      |
> |                         Statistics: Num rows: 10000000 Data size: 880000000 Basic stats:
COMPLETE Column stats: PARTIAL                                                  |
> |                         Reduce Output Operator                                    
                                                                                      |
> |                           key expressions: _col2 (type: bigint)                   
                                                                                      |
> |                           sort order: +                                           
                                                                                      |
> |                           Map-reduce partition columns: _col2 (type: bigint)      
                                                                                      |
> |                           Statistics: Num rows: 10000000 Data size: 880000000 Basic
stats: COMPLETE Column stats: PARTIAL                                                |
> |                           value expressions: _col1 (type: bigint), _col4 (type: double),
_col5 (type: double), _col6 (type: date)                                        |
> |             Execution mode: vectorized, llap                                      
                                                                                      |
> |             LLAP IO: all inputs                                                   
                                                                                      |
> |         Map 3                                                                     
                                                                                      |
> |             Map Operator Tree:                                                    
                                                                                      |
> |                 TableScan                                                         
                                                                                      |
> |                   alias: orders                                                   
                                                                                      |
> |                   filterExpr: (o_orderkey is not null and o_custkey is not null) (type:
boolean)                                                                         |
> |                   Statistics: Num rows: 4318801126 Data size: 51825626753 Basic stats:
COMPLETE Column stats: NONE                                                       |
> |                   Filter Operator                                                 
                                                                                      |
> |                     predicate: (o_orderkey is not null and o_custkey is not null) (type:
boolean)                                                                        |
> |                     Statistics: Num rows: 4318801126 Data size: 51825626753 Basic stats:
COMPLETE Column stats: NONE                                                     |
> |                     Select Operator                                               
                                                                                      |
> |                       expressions: o_orderkey (type: int), o_custkey (type: bigint)
                                                                                     |
> |                       outputColumnNames: _col0, _col1                             
                                                                                      |
> |                       Statistics: Num rows: 4318801126 Data size: 51825626753 Basic
stats: COMPLETE Column stats: NONE                                                   |
> |                       Map Join Operator                                           
                                                                                      |
> |                         condition map:                                            
                                                                                      |
> |                              Inner Join 0 to 1                                    
                                                                                      |
> |                         keys:                                                     
                                                                                      |
> |                           0 _col2 (type: bigint)                                  
                                                                                      |
> |                           1 UDFToLong(_col0) (type: bigint)                       
                                                                                      |
> |                         outputColumnNames: _col1, _col4, _col5, _col6, _col8      
                                                                                      |
> |                         input vertices:                                           
                                                                                      |
> |                           0 Map 2                                                 
                                                                                      |
> |                         Statistics: Num rows: 4750681341 Data size: 57008190663 Basic
stats: COMPLETE Column stats: NONE                                                 |
> |                         Reduce Output Operator                                    
                                                                                      |
> |                           key expressions: _col8 (type: bigint)                   
                                                                                      |
> |                           sort order: +                                           
                                                                                      |
> |                           Map-reduce partition columns: _col8 (type: bigint)      
                                                                                      |
> |                           Statistics: Num rows: 4750681341 Data size: 57008190663 Basic
stats: COMPLETE Column stats: NONE                                               |
> |                           value expressions: _col1 (type: bigint), _col4 (type: double),
_col5 (type: double), _col6 (type: date)                                        |
> |             Execution mode: vectorized, llap                                      
                                                                                      |
> |             LLAP IO: all inputs                                                   
                                                                                      |
> |         Map 7                                                                     
                                                           
> {code}
> Note the Map2 to Map3 output.
> This causes a rather large join (120GB) to be categorized as a map-join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message