hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From László Bodor (Jira) <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-22941) Empty files are inserted into external tables after HIVE-21714
Date Thu, 27 Feb 2020 10:45:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-22941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046467#comment-17046467
] 

László Bodor edited comment on HIVE-22941 at 2/27/20 10:44 AM:
---------------------------------------------------------------

issue reproduced:
{code}
export QTEST_LEAVE_FILES=true
mvn test -Dtest.output.overwrite=true -Pitests,hadoop-2 -Denforcer.skip=true -pl itests/qtest
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=empty_files_non_bucketed.q
...
 lbodor@HW12459  ~/repos/hive   HDP-3.1-maint ●  ls -la itests/qtest/target/localfs/warehouse/t1/000000_0
-rw-r--r--  1 lbodor  staff  0 Feb 25 11:42 itests/qtest/target/localfs/warehouse/t1/000000_0
{code}
https://github.com/abstractdog/hive/commit/7e08a3f654d67848cc2f3a915ebb8294d98e4328


easy fix with acid/mm regression:
https://github.com/abstractdog/hive/commit/8e25b5ce11220e22dbe90958d52c63b52a482931

not necessarily related, but there are other recent jiras about empty files, I'm linking them
in order to be aware of each other: HIVE-22918 (, HIVE-22938 for MR)


was (Author: abstractdog):
issue reproduced:
{code}
export QTEST_LEAVE_FILES=true
mvn test -Dtest.output.overwrite=true -Pitests,hadoop-2 -Denforcer.skip=true -pl itests/qtest
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=empty_files_non_bucketed.q
...
 lbodor@HW12459  ~/repos/hive   HDP-3.1-maint ●  ls -la itests/qtest/target/localfs/warehouse/t1/000000_0
-rw-r--r--  1 lbodor  staff  0 Feb 25 11:42 itests/qtest/target/localfs/warehouse/t1/000000_0
{code}
https://github.com/abstractdog/hive/commit/7e08a3f654d67848cc2f3a915ebb8294d98e4328


easy fix with acid/mm regression:
https://github.com/abstractdog/hive/commit/8e25b5ce11220e22dbe90958d52c63b52a482931


> Empty files are inserted into external tables after HIVE-21714
> --------------------------------------------------------------
>
>                 Key: HIVE-22941
>                 URL: https://issues.apache.org/jira/browse/HIVE-22941
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Major
>
> There were multiple patches targeting an issue when INSERT OVERWRITE was ineffective
if the input is empty:
> HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
> HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input is empty
> HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the input is
empty
> From these patches, HIVE-21714 seems to have a bad effect on external tables, because
of this part:
> https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268
> The original issue before HIVE-21714 was that the original files in the table survived
an insert overwrite, and select(*)>0 was after that. HIVE-21714 seems to enable writing
empty files regardless of execution engine / table type, which is not the proper way, as the
proper solution would be to completely avoid writing empty files for Tez (this is what HIVE-14014
was about). I found that changing condition to...
> {code}
> if (!isTez && (isStreaming || this.isInsertOverwrite)) 
> {code}
> (which could be an easy solution for external tables) breaks some test cases (both full
ACID and MM) in insert_overwrite.q, which could mean they rely somehow on the empty generated
file. We need to find a proper solution which is applicable for all table types without polluting
external tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message