hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Suller (JIRA)" <>
Subject [jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result
Date Tue, 26 Mar 2019 10:25:00 GMT


Ivan Suller commented on HIVE-21509:

[~kgyrtkirk] it is possible. I already closed the ticket tracking that issue, because I couldn't
reproduce it anymore. But if it is a cache issue this is expected.

> LLAP may cache corrupted column vectors and return wrong query result
> ---------------------------------------------------------------------
>                 Key: HIVE-21509
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>            Priority: Major
> In some scenarios, LLAP might store column vectors in cache that are getting reused and
reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to reproduce, but
the odds of surfacing this issue can by improved by setting LLAP executor and IO thread counts
this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set;
>  * using TPCDS input data of store_sales table, which is in text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  WITH SERDEPROPERTIES
(    'field.delim'='|',    'serialization.format'='|')  STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT    ''{code}
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query will trigger
reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a Thread.sleep(250) at
the beginning of VectorDeserializeOrcWriter#run().

This message was sent by Atlassian JIRA

View raw message