hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vipin Vishvkarma (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called
Date Wed, 10 Jun 2020 18:31:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132608#comment-17132608
] 

Vipin Vishvkarma commented on HIVE-21052:
-----------------------------------------

[~dkuzmenko] Sorry for a bit late reply.
 
I and [~asomani] have gone through the current patch/doc, just to summarize the design,  
1. Add a dummy entry in TXN_COMPONENTS while taking a lock
2. Remove the above and add the actual partitions in addDynamicPartitions
3. On an abort before step 2, the entry in TXN_COMPONENTS will remain and signal that cleanup
needs to be done.
4. The initiator will add a row in COMPACTION_QUEUE (with type 'p') for the above-aborted
txn with the state as READY_FOR_CLEANING
4. Introduce a new type of cleanup (p-type), which will do the cleanup of the above by doing
a table-level scan and deletion of aborted dirs
5. Add a thread pool in the cleaner to run above cleanup in parallel with the regular cleanup

In the current patch, we have found out some shortcomings/issues which are,
1. The current multi-threaded solution is not complete till we fix HIVE-21150
2. The current solution allows parallel cleanup on the same partition, as all regular cleanup
only takes a shared lock. This we need to change, or if we can allow parallel cleanup on the
same partition, then why do we need an exclusive lock for p-type cleanup.
3. Only delta dirs cleanup is handled, aborted IOW dirs cleanup is still missing in both static/dynamic
partition case for MM table and the data can be read once cleaner removes the entry from the
TXN table.

So for now, we have decided to go with a single-threaded cleaner and fix this for Hive 3
first, as our customers have been blocked because of this. 

For Hive 4, we need some inputs as we don't have expertise, open questions,
1. Is there a concern in removing aborted base dirs, like we remove aborted delta dir for
MM table in worker
2. As we don't see much benefit from current multi-threaded cleaner implementation, should
we remove this for now?

> Make sure transactions get cleaned if they are aborted before addPartitions is called
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-21052
>                 URL: https://issues.apache.org/jira/browse/HIVE-21052
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.0.0, 3.1.1
>            Reporter: Jaume M
>            Assignee: Jaume M
>            Priority: Critical
>         Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, HIVE-21052.10.patch,
HIVE-21052.11.patch, HIVE-21052.12.patch, HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch,
HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, HIVE-21052.8.patch, HIVE-21052.9.patch
>
>
> If the transaction is aborted between openTxn and addPartitions and data has been written
on the table the transaction manager will think it's an empty transaction and no cleaning
will be done.
> This is currently an issue in the streaming API and in micromanaged tables. As proposed
by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and when addPartitions
is called remove this entry from TXN_COMPONENTS and add the corresponding partition entry
to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that specifies
that a transaction was opened and it was aborted it must generate jobs for the worker for
every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message