hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19838) simplify & fix ColumnizedDeleteEventRegistry load loop
Date Mon, 11 Jun 2018 23:05:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508925#comment-16508925
] 

Eugene Koifman commented on HIVE-19838:
---------------------------------------

I think one of the ways {{totalDeleteEventCount}} in {{ColumnizedDeleteEventRegistry}} may
be off, is that {{DeleteReaderValue}} takes a ValidWriteIdList which means that {{next()}}
may skip some event because it belongs to a transaction that was not yet committed when the
current reader locked in the snapshot.
In practice, this would require compaction (at least a minor one) which includes a txn that
is open to the reader's txn, to complete before the VectorizedOrc reader starts reading -
which is possible but not very likely.

Another issue, which I think is eliminated by the current patch is, 
{noformat}
        if (lastSeenOwid != deleteRecordKey.originalWriteId ||
          lastSeenBucketProperty != deleteRecordKey.bucketProperty) {
          ++distinctOwids;
          lastSeenOwid = deleteRecordKey.originalWriteId;
          lastSeenBucketProperty = deleteRecordKey.bucketProperty;
        }
{noformat}
{{distinctOwids}} is incremented when bucketProperty changes, which seems invalid even for
bucketed tables.


> simplify & fix ColumnizedDeleteEventRegistry load loop
> ------------------------------------------------------
>
>                 Key: HIVE-19838
>                 URL: https://issues.apache.org/jira/browse/HIVE-19838
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HIVE-19838.01.patch, HIVE-19838.patch
>
>
> Apparently sometimes the delete count in ACID stats doesn't match what merger actually
returns.
> It could be due to some deltas having duplicate deletes from parallel queries (I guess?)
that are being squashed by the merger or some other reasons beyond my mortal comprehension.
> The loop assumes the merger will return the exact number of records, so it fails with
array index exception. Also, it could actually be done in a single loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message