hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From László Bodor (Jira) <>
Subject [jira] [Updated] (HIVE-22941) Empty files are inserted into external tables after HIVE-21714
Date Sat, 29 Feb 2020 07:40:00 GMT


László Bodor updated HIVE-22941:
    Attachment: HIVE-22941.02.patch

> Empty files are inserted into external tables after HIVE-21714
> --------------------------------------------------------------
>                 Key: HIVE-22941
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>             Fix For: 4.0.0
>         Attachments: HIVE-22941.01.patch, HIVE-22941.02.patch, HIVE-22941.02.patch
> There were multiple patches targeting an issue when INSERT OVERWRITE was ineffective
if the input is empty:
> HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
> HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input is empty
> HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the input is
> From these patches, HIVE-21714 seems to have a bad effect on external tables, because
of this part:
> The original issue before HIVE-21714 was that the original files in the table survived
an insert overwrite, and select(*)>0 was after that. HIVE-21714 seems to enable writing
empty files regardless of execution engine / table type, which is not the proper way, as the
proper solution would be to completely avoid writing empty files for Tez (this is what HIVE-14014
was about). I found that changing condition to...
> {code}
> if (!isTez && (isStreaming || this.isInsertOverwrite)) 
> {code}
> (which could be an easy solution for external tables) breaks some test cases (both full
ACID and MM) in insert_overwrite.q, which could mean they rely somehow on the empty generated
file. We need to find a proper solution which is applicable for all table types without polluting
external tables.

This message was sent by Atlassian Jira

View raw message