hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <>
Subject [jira] [Commented] (HIVE-19838) simplify & fix ColumnizedDeleteEventRegistry load loop
Date Mon, 11 Jun 2018 23:05:00 GMT


Eugene Koifman commented on HIVE-19838:

I think one of the ways {{totalDeleteEventCount}} in {{ColumnizedDeleteEventRegistry}} may
be off, is that {{DeleteReaderValue}} takes a ValidWriteIdList which means that {{next()}}
may skip some event because it belongs to a transaction that was not yet committed when the
current reader locked in the snapshot.
In practice, this would require compaction (at least a minor one) which includes a txn that
is open to the reader's txn, to complete before the VectorizedOrc reader starts reading -
which is possible but not very likely.

Another issue, which I think is eliminated by the current patch is, 
        if (lastSeenOwid != deleteRecordKey.originalWriteId ||
          lastSeenBucketProperty != deleteRecordKey.bucketProperty) {
          lastSeenOwid = deleteRecordKey.originalWriteId;
          lastSeenBucketProperty = deleteRecordKey.bucketProperty;
{{distinctOwids}} is incremented when bucketProperty changes, which seems invalid even for
bucketed tables.

> simplify & fix ColumnizedDeleteEventRegistry load loop
> ------------------------------------------------------
>                 Key: HIVE-19838
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HIVE-19838.01.patch, HIVE-19838.patch
> Apparently sometimes the delete count in ACID stats doesn't match what merger actually
> It could be due to some deltas having duplicate deletes from parallel queries (I guess?)
that are being squashed by the merger or some other reasons beyond my mortal comprehension.
> The loop assumes the merger will return the exact number of records, so it fails with
array index exception. Also, it could actually be done in a single loop.

This message was sent by Atlassian JIRA

View raw message