hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21407) Parquet predicate pushdown is not working correctly for char column types
Date Tue, 02 Apr 2019 14:52:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807831#comment-16807831
] 

Hive QA commented on HIVE-21407:
--------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  1s{color} |
{color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 39s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 12s{color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 43s{color}
| {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 19s{color} | {color:blue}
ql in master has 2258 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  4s{color} |
{color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 37s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 16s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 16s{color} | {color:green}
the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 44s{color} | {color:red}
ql: The patch generated 13 new + 72 unchanged - 0 fixed = 85 total (was 72) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 26s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  4s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 15s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 53s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03)
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16822/dev-support/hive-personality.sh
|
| git revision | master / 606d3f2 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16822/yetus/diff-checkstyle-ql.txt
|
| modules | C: ql U: ql |
| Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16822/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> Parquet predicate pushdown is not working correctly for char column types
> -------------------------------------------------------------------------
>
>                 Key: HIVE-21407
>                 URL: https://issues.apache.org/jira/browse/HIVE-21407
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>         Attachments: HIVE-21407.patch
>
>
> If the 'hive.optimize.index.filter' parameter is false, the filter predicate is not pushed
to parquet, so the filtering only happens within Hive. If the parameter is true, the filter
is pushed to parquet, but for a char type, the value which is pushed to Parquet will be padded
with spaces:
> {noformat}
>   @Override
>   public void setValue(String val, int len) {
>     super.setValue(HiveBaseChar.getPaddedValue(val, len), -1);
>   }
> {noformat} 
> So if we have a char(10) column which contains the value "apple" and the where condition
looks like 'where c='apple'', the value pushed to Paquet will be 'apple' followed by 5 spaces.
But the stored values are not padded, so no rows will be returned from Parquet.
> How to reproduce:
> {noformat}
> $ create table ppd (c char(10), v varchar(10), i int) stored as parquet;
> $ insert into ppd values ('apple', 'bee', 1),('apple', 'tree', 2),('hello', 'world',
1),('hello','vilag',3);
> $ set hive.optimize.ppd.storage=true;
> $ set hive.vectorized.execution.enabled=true;
> $ set hive.vectorized.execution.enabled=false;
> $ set hive.optimize.ppd=true;
> $ set hive.optimize.index.filter=true;
> $ set hive.parquet.timestamp.skip.conversion=false;
> $ select * from ppd where c='apple';
> +--------+--------+--------+
> | ppd.c  | ppd.v  | ppd.i  |
> +--------+--------+--------+
> +--------+--------+--------+
> $ set hive.optimize.index.filter=false; or set hive.optimize.ppd.storage=false;
> $ select * from ppd where c='apple';
> +-------------+--------+--------+
> |    ppd.c    | ppd.v  | ppd.i  |
> +-------------+--------+--------+
> | apple       | bee    | 1      |
> | apple       | tree   | 2      |
> +-------------+--------+--------+
> {noformat}
> The issue surfaced after uploading the fix for [HIVE-21327|https://issues.apache.org/jira/browse/HIVE-21327]
was uploaded upstream. Before the HIVE-21327 fix, setting the parameter 'hive.parquet.timestamp.skip.conversion'
to true in the parquet_ppd_char.q test hid this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message