hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL
Date Fri, 24 Aug 2018 23:00:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592292#comment-16592292
] 

Eugene Koifman commented on HIVE-18772:
---------------------------------------

HIVE-20459 would be nice to have here

> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---------------------------------------
>
>                 Key: HIVE-18772
>                 URL: https://issues.apache.org/jira/browse/HIVE-18772
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>    Affects Versions: 3.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>            Priority: Major
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers.  Each ValidTxnList
has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all currently active
readers
> This means that no active transaction in the system sees any txn with txnid < X as
open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in getAcidState(), any
files determined by this call as 'obsolete', will be seen as obsolete by any existing/future
reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the state of
Lock Manager is not sufficient.  For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1.  (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on Table1/Part1
(17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be no locks
on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot use base_14
is that may have absorbed delete events from delete_delta_14.
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message