hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <>
Subject [jira] [Commented] (HIVE-19867) Test and verify Concurrent INSERTS
Date Tue, 26 Jun 2018 00:31:00 GMT


Sergey Shelukhin commented on HIVE-19867:

We were discussing the partition case with [~ekoifman].
Tangentially based on that, I don't think we need this multi insert detection with current
We already have valid write ID list "isEquivalent" check, so after multiple inserts in parallel,
it doesn't matter who writes stats last, it will simply become not isEquivalent, so no extra
checks are needed.
Can you describe a scenario where reader gets invalid stats with concurrent writers (i.e.
where isEquivalent will return true but stats are still invalid?). From the above I cannot
see it happening.

However Eugene was suggesting that we actually redo the whole stats correctness to rely mostly
on write path, in that case this approach (or rather similar more comprehensive one that handles
couple more special cases) will help.
Actually we may not even need to store write ID list and txn in that case, only the last write
ID. But we'd also need to ensure that every query affecting data affects stats, either by
updating them, or by removing the flag/write ID (including queries with stats collection disabled,
alters, etc.). 
I'll send an email with details to discuss.

> Test and verify Concurrent INSERTS  
> ------------------------------------
>                 Key: HIVE-19867
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>    Affects Versions: 4.0.0
>            Reporter: Steve Yeom
>            Assignee: Steve Yeom
>            Priority: Major
>             Fix For: 4.0.0

This message was sent by Atlassian JIRA

View raw message