hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Somani (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13479) Relax sorting requirement in ACID tables
Date Tue, 19 Mar 2019 04:57:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795678#comment-16795678
] 

Abhishek Somani commented on HIVE-13479:
----------------------------------------

[~ekoifman] [~vgumashta] [~gopalv]

Do we have any plans to work on this?

> Relax sorting requirement in ACID tables
> ----------------------------------------
>
>                 Key: HIVE-13479
>                 URL: https://issues.apache.org/jira/browse/HIVE-13479
>             Project: Hive
>          Issue Type: New Feature
>          Components: Transactions
>    Affects Versions: 1.2.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>            Priority: Major
>   Original Estimate: 160h
>  Remaining Estimate: 160h
>
> Currently ACID tables require data to be sorted according to internal primary key.  This
is that base + delta files can be efficiently sort/merged to produce the snapshot for current
transaction.
> This prevents the user to make the table sorted based on any other criteria which can
be useful.  One example is using dynamic partition insert (which also occurs for update/delete
SQL).  This may create lots of writers (buckets*partitions) and tax cluster resources.
> The usual solution is hive.optimize.sort.dynamic.partition=true which won't be honored
for ACID tables.
> We could rely on hash table based algorithm to merge delta files and then not require
any particular sort on Acid tables.  One way to do that is to treat each update event as an
Insert (new internal PK) + delete (old PK).  Delete events are very small since they just
need to contain PKs.  So the hash table would just need to contain Delete events and be reasonably
memory efficient.
> This is a significant amount of work but worth doing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message