hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Varga (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert
Date Tue, 25 Aug 2020 15:53:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184154#comment-17184154
] 

Peter Varga commented on HIVE-23725:
------------------------------------

[~kgyrtkirk] I will have to work on an upgraded version of this, because releasing the locks
when reexecuting causing problems when the degree of concurrent update queries for the same
partition is high. I will try to address your comments there and invite you to review.

Few questions, so I can go for the right direction:

 * I have seen that the other plugin implementation use hooks, but my problem was that this
exception is thrown between the compilation and execution phase. Can I use the failure hook
there, would the failure hook be called?
*  The whole reexecution count misery was introduced, to make it possible for the new plugin
to have an individual config for rexec count that can be higher than the HIVE_QUERY_MAX_REEXECUTION_COUNT,
but still keep that config for every other plugin. If you have an idea how to solve this properly,
I happy to implement it, because I do not like the current solution either

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> ------------------------------------------------------------------------
>
>                 Key: HIVE-23725
>                 URL: https://issues.apache.org/jira/browse/HIVE-23725
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Peter Varga
>            Assignee: Peter Varga
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and starts to read
committed transactions that were not committed when the query compilation happened, it can
cause partial read problems if the committed transaction created new partition in the source
or target table.
> The solution should be not only fix the snapshot but also recompile the query and acquire
the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned source table
that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new partition
to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table of the merge
statement, that will retrigger a snapshot generation in transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read partial data
from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of the source
table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the target
table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the isValidTxnListState
will be triggered and we do a partial read of the transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message