hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization
Date Fri, 27 May 2016 12:47:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304019#comment-15304019
] 

Gopal V commented on HIVE-13872:
--------------------------------

Is worse than I expected - the configuration objects do not have the projection & filters
set before it goes into the MapOperator initialization.

Only code which lies beneath HiveInputFormat can access it, since pushProjectionAndFilters
needs to be called before this information transfers from the TableScan into the Configuration
object.

> Vectorization: Fix cross-product reduce sink serialization
> ----------------------------------------------------------
>
>                 Key: HIVE-13872
>                 URL: https://issues.apache.org/jira/browse/HIVE-13872
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.1.0
>            Reporter: Gopal V
>         Attachments: HIVE-13872.WIP.patch
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 projection column
num 1
>         at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
>         at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
>         at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
>         at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
>         at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>         at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
>         ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>      ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>      )or
>      (
>    customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>      ))
> ;
> {code}
> {code}
>         Map 3 
>             Map Operator Tree:
>                 TableScan
>                   alias: customer_demographics
>                   Statistics: Num rows: 1920800 Data size: 717255532 Basic stats: COMPLETE
Column stats: NONE
>                   Reduce Output Operator
>                     sort order: 
>                     Statistics: Num rows: 1920800 Data size: 717255532 Basic stats: COMPLETE
Column stats: NONE
>                     value expressions: cd_demo_sk (type: int), cd_marital_status (type:
string)
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message