hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karen Coppage (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (HIVE-22474) Query based major compaction always creates only one bucket file
Date Wed, 17 Jun 2020 10:18:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-22474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karen Coppage resolved HIVE-22474.
----------------------------------
    Resolution: Duplicate

> Query based major compaction always creates only one bucket file
> ----------------------------------------------------------------
>
>                 Key: HIVE-22474
>                 URL: https://issues.apache.org/jira/browse/HIVE-22474
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: László Pintér
>            Assignee: László Pintér
>            Priority: Major
>
> {code:sql}
> set hive.execution.engine=mr;
> drop table if exists tbl2;
> create table tbl2 (a int, b int) clustered by (a) into 2 buckets stored as ORC TBLPROPERTIES('bucketing_version'='2',
'transactional'='true', 'compactorthreshold.hive.compactor.delta.num.threshold'='3');
> insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
> insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
> delete from tbl2 where b = 2;
> insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);
> delete from tbl2 where a = 1;
> {code}
> Having the above use case, at the end of the major compaction the base directory contains
only one bucket file, although the table is bucketed in 2 buckets. Before running the compaction,
the delta directories contains the right amount of bucket files, and the data is split accordingly. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message