hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <>
Subject [jira] [Commented] (HIVE-19838) simplify & fix ColumnizedDeleteEventRegistry load loop
Date Tue, 12 Jun 2018 00:42:00 GMT


Eugene Koifman commented on HIVE-19838:

I left a couple of nits on RB
Ignore my previous comment about distinctOwids.  It's a poorly named variable - it's really
counting the number of distinct (writeid, bucketproperty) pairs and the search on CompressedOwid
matches this.  

Note to self:
For unbucketed tables, if multiple bucket files are all loaded, each files has it's own reader
in the heap, which means regardless of how delete events are spread among files, the heap
sorts all of them by (writeid, bucketprop. rowid) so ColumnizedDeleteEventRegistry.isDeleted()
looks ok.

> simplify & fix ColumnizedDeleteEventRegistry load loop
> ------------------------------------------------------
>                 Key: HIVE-19838
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HIVE-19838.01.patch, HIVE-19838.patch
> Apparently sometimes the delete count in ACID stats doesn't match what merger actually
> It could be due to some deltas having duplicate deletes from parallel queries (I guess?)
that are being squashed by the merger or some other reasons beyond my mortal comprehension.
> The loop assumes the merger will return the exact number of records, so it fails with
array index exception. Also, it could actually be done in a single loop.

This message was sent by Atlassian JIRA

View raw message