hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (Jira)" <>
Subject [jira] [Assigned] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc
Date Wed, 28 Jul 2021 14:14:04 GMT


Eugene Koifman reassigned HIVE-14211:

    Assignee:     (was: Eugene Koifman)

> AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc
> ---------------------------------------------------------------------------------
>                 Key: HIVE-14211
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Priority: Major
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>    preserve the history, we can't use a base directory that includes a
>    transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are deltas that
contain X, we'll have to abort the txn.  This can't (reasonably) happen with auto commit mode,
but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour later
it decides to access some partition for which all txns < 20 (for example) have already
been compacted (i.e. GC'd).  
> ==========================================================
> Here is a more concrete example.  Let's say the file for table A are as follows and created
in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose base_5 "TxnBase
bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and delta_17_17
in _Directory_ object.  This would represent acceptable snapshot for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see everything
with txnid<17 as obsolete.  Then it will check lock manger state and decide to delete (as
there may not be any locks in LM for table A).  The order in which the files are deleted is
undefined right now.  It may delete delta_16_16 and delta_17_17 first and right at this moment
the read request with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by
some multi-stmt txn that started some time ago.  It acquires locks after the Cleaner checks
LM state and calls getAcidState(). This request will choose base_5 but it won't see delta_16_16
and delta_17_17 and thus return the snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The reason is
the a query (0) opens txn (if appropriate), (1) acquires locks, (2) locks in the snapshot.
 The cleaner won't delete anything for a given compaction (partition) if there are locks on
it.  Thus for duration of the transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest base_x as the
"best" and check against exceptions in ValidTxnList and throw an exception if there is an
exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only throw if there
is an open txn <=x.  Compaction throws away data from aborted txns and thus there is no
harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and prevent the
cleaner from cleaning anything delta with id range that includes this open txn id for any
txn that is still running.  This has a drawback of potentially delaying GC of old files for
arbitrarily long periods.  So this should be a user config choice.   The implementation is
not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of the other

This message was sent by Atlassian Jira

View raw message