hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-19124) implement a basic major compactor for MM tables
Date Fri, 06 Apr 2018 22:41:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429094#comment-16429094
] 

Sergey Shelukhin edited comment on HIVE-19124 at 4/6/18 10:40 PM:
------------------------------------------------------------------

The watermark issue for compactor specifically could probably be addressed by
1) Modifying txn list in recordValidWriteIds in Driver based on the flag. We create the driver
so the flag can be set directly, no shennanigans necessary. Compactor write IDs can be created
and serialized for the query to only read the data we want.
2) When renaming the directory, generating the final name in commit method, AFTER the query,
based on write IDs that the driver actually used.

That way we don't even need any UDFs or INPUT_FILE_NAME stuff and it will work just like that.
I'm not sure I'll have enough time to finish this today and I'm out next week, but I'll attach
a WIP patch. 

For insert overwrite outside of compaction this won't work because we do need to overwrite
deltas above watermark that have already committed but not the ones in progress, so base would
need to be discontinuous. But for compaction we don't need that.


was (Author: sershe):
The watermark issue for compactor specifically could probably be addressed by
1) Modifying txn list in recordValidWriteIds in Driver based on the flag. We create the driver
so the flag can be set directly, no shennanigans necessary. Compactor write IDs can be created
and serialized for the query to only read the data we want.
2) When renaming the directory, generating the final name in commit method, AFTER the query,
based on write IDs that the driver actually used.

That way we don't even need any UDFs or INPUT_FILE_NAME stuff and it will work just like that.
I'm not sure I'll have enough time to finish this today and I'm out next week, but I'll attach
a WIP patch. 

> implement a basic major compactor for MM tables
> -----------------------------------------------
>
>                 Key: HIVE-19124
>                 URL: https://issues.apache.org/jira/browse/HIVE-19124
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>              Labels: mm-gap-2
>         Attachments: HIVE-19124.01.patch, HIVE-19124.patch
>
>
> For now, it will run a query directly and only major compactions will be supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message