hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From László Bodor (Jira) <j...@apache.org>
Subject [jira] [Updated] (HIVE-22941) Empty files are inserted into external tables after HIVE-21714
Date Thu, 27 Feb 2020 10:11:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-22941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

László Bodor updated HIVE-22941:
--------------------------------
    Description: 
There were multiple patches targeting an issue when INSERT OVERWRITE was ineffective if the
input is empty:
HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input is empty
HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the input is empty

>From these patches, HIVE-21714 seems to have a bad effect on external tables, because
of this part:
https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268

The issue was that the original files in the table survived an insert overwrite, and select(*)>0
was after that. HIVE-21714 seems to enable writing empty files regardless of execution engine,
which is not the proper way, as the proper solution would be to completely avoid writing empty
files for Tez (this is what HIVE-14014 was about). I found that changing condition to...
{code}
if (!isTez && (isStreaming || this.isInsertOverwrite)) 
{code}
(which could be an easy solution for external tables) breaks some test cases (both full ACID
and MM) in insert_overwrite.q, which could mean they rely somehow on the empty generated file.
We need to find a proper solution which is applicable for all table types without polluting
external tables.

  was:
There were multiple patches targeting an issue when INSERT OVERWRITE was ineffective if the
input is empty:
HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input is empty
HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the input is empty

>From these patches, HIVE-21714 seems to have a bad effect on external tables, because
of this part:
https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268

The issue was that the original files in the table survived an insert overwrite, and select(*)>0
was after that. HIVE-21714 seems to enable writing empty files regardless of execution engine,
which is not the proper way, as the proper solution would be to completely avoid writing empty
files for Tez (this is what HIVE-14014 was about). I found that changing to logic to...
{code}
if (!isTez && (isStreaming || this.isInsertOverwrite)) 
{code}
(which could be an easy solution for external tables) breaks some test cases (both full ACID
and MM) in insert_overwrite.q, which could mean they rely somehow on the empty generated file.
We need to find a proper solution which is applicable for all table types without polluting
external tables.


> Empty files are inserted into external tables after HIVE-21714
> --------------------------------------------------------------
>
>                 Key: HIVE-22941
>                 URL: https://issues.apache.org/jira/browse/HIVE-22941
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Major
>
> There were multiple patches targeting an issue when INSERT OVERWRITE was ineffective
if the input is empty:
> HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
> HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input is empty
> HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the input is
empty
> From these patches, HIVE-21714 seems to have a bad effect on external tables, because
of this part:
> https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268
> The issue was that the original files in the table survived an insert overwrite, and
select(*)>0 was after that. HIVE-21714 seems to enable writing empty files regardless of
execution engine, which is not the proper way, as the proper solution would be to completely
avoid writing empty files for Tez (this is what HIVE-14014 was about). I found that changing
condition to...
> {code}
> if (!isTez && (isStreaming || this.isInsertOverwrite)) 
> {code}
> (which could be an easy solution for external tables) breaks some test cases (both full
ACID and MM) in insert_overwrite.q, which could mean they rely somehow on the empty generated
file. We need to find a proper solution which is applicable for all table types without polluting
external tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message