hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-24235) Drop and recreate table during MR compaction leaves behind base/delta directory
Date Mon, 26 Jul 2021 12:23:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-24235?focusedWorklogId=627701&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-627701
]

ASF GitHub Bot logged work on HIVE-24235:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jul/21 12:22
            Start Date: 26/Jul/21 12:22
    Worklog Time Spent: 10m 
      Work Description: klcopp commented on a change in pull request #2503:
URL: https://github.com/apache/hive/pull/2503#discussion_r676553716



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##########
@@ -716,6 +757,13 @@ void open(CompactionInfo ci) throws TException {
       }
       this.txnId = msc.openTxn(ci.runAs, TxnType.COMPACTION);
       status = TxnStatus.OPEN;
+
+      LockRequest lockRequest = createLockRequest(ci, txnId);
+      LockResponse res = msc.lock(lockRequest);

Review comment:
       Or does this happen automatically on commit/abort?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 627701)
    Time Spent: 1h  (was: 50m)

> Drop and recreate table during MR compaction leaves behind base/delta directory
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-24235
>                 URL: https://issues.apache.org/jira/browse/HIVE-24235
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> If a table is dropped and recreated during MR compaction, the table directory and a base
(or delta, if minor compaction) directory could be created, with or without data, while the
table "does not exist".
> E.g.
> {code:java}
> create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true");
> insert into c values (9);
> insert into c values (9);
> alter table c compact 'major';
> While compaction job is running: {
> drop table c;
> create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true");
> }
> {code}
> The table directory should be empty, but table directory could look like this after the
job is finished:
> {code:java}
> Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
> Oct  6 14:23 c/base_0000002_v0000101/.bucket_00000.crc
> Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
> Oct  6 14:23 c/base_0000002_v0000101/bucket_00000
> {code}
> or perhaps just: 
> {code:java}
> Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
> Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
> {code}
> Insert another row and you have:
> {code:java}
> Oct  6 14:33 base_0000002_v0000101/
> Oct  6 14:33 base_0000002_v0000101/._orc_acid_version.crc
> Oct  6 14:33 base_0000002_v0000101/.bucket_00000.crc
> Oct  6 14:33 base_0000002_v0000101/_orc_acid_version
> Oct  6 14:33 base_0000002_v0000101/bucket_00000
> Oct  6 14:35 delta_0000001_0000001_0000/._orc_acid_version.crc
> Oct  6 14:35 delta_0000001_0000001_0000/.bucket_00000_0.crc
> Oct  6 14:35 delta_0000001_0000001_0000/_orc_acid_version
> Oct  6 14:35 delta_0000001_0000001_0000/bucket_00000_0
> {code}
> Selecting from the table will result in this error because the highest valid writeId
for this table is 1:
> {code:java}
> thrift.ThriftCLIService: Error fetching results: 
> org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set
>         at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482)
~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> ...
> Caused by: java.io.IOException: java.lang.RuntimeException: ORC split generation failed
with exception: java.io.IOException: Not enough history available for (1,x).  Oldest available
base: .../warehouse/b/base_0000004_v0000092
> {code}
> Solution: Resolve the table again after compaction is finished; compare the id with the
table id from when compaction began. If the ids do not match, abort the compaction's transaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message