hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10924) add support for MERGE statement
Date Fri, 28 Aug 2015 18:00:46 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720334#comment-14720334
] 

Eugene Koifman commented on HIVE-10924:
---------------------------------------

h3. Feature design notes
Hive supports [multi-insert statement|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries].
 The idea is that you can execute a select statement and split the result stream into several
to write to multiple targets.

This matches very closely to what MERGE statement needs to do.
When modeling MERGE as multi-insert, we'd split the stream into 2 stream, 1 for the insert
part, 1 for update part but write both results to the same table.  
Section 14.12 of  ISO/IEC 9075-2:2011(E) (SQL 2011) defines MERGE statement.   

Suppose we have tables 
{code:SQL}
CREATE TABLE target(a int, b int, c int);
CREATE TABLE source(x int, y int, z int);
{code}

Then an example that covers most possibilities might look like this
{code:SQL}
MERGE INTO target 
USING source ON b = y
WHEN MATCHED AND c + 1 + z > 0
THEN THEN UPDATE SET a = 1, c = z
WHEN NOT MATCHED AND z IS NULL
THEN INSERT(a,b) VALUES(z, 7)
{code}
\\
\\
And is interpreted as follows
\\
\\
|| Line || Statement Part || Notes ||
| 1 | {code:SQL} MERGE INTO target {code} | Specifies the table being modified |
| 2 | {code:SQL} USING source {code} | specifies the source of the data which may be a table
or expression such as SELECT … FROM … |
| 3 | {code:SQL}     ON b = y {code} | is interpreted like exactly like an ON clause of a
JOIN between source and target. |
| 4 | {code:SQL} WHEN MATCHED {code} | Applies if expr in ON is true |
| 5 | {code:SQL}     AND c + 1 + z > 0 {code} | Additional predicate to test before performing
the action. |
| 6 | {code:SQL}     THEN UPDATE SET a = 1, c = z {code} | May be UPDATE or DELETE.  The later
deletes the row from target.  SET clause is exactly like in regular UPDATE stmt. |
| 7 | {code:SQL} WHEN NOT MATCHED {code} | Applies if expr in ON is false |
| 8 | {code:SQL}     AND z IS NULL {code} | Additional predicate to test before performing
the action. |
| 9 | {code:SQL} THEN INSERT(a,b) VALUES(z, 7){code} | Insert to perform on target. |
\\
\\

Then the "equivalent" _multi-insert statement_ looks like this:
\\
\\
|| Statement Part ||  Refernce to previous table ||
| {code:SQL} FROM (SELECT * FROM target RIGHT OUTER JOIN SOURCE ON b = y) {code} | Lines 1
- 3 | 
| {code:SQL} INSERT INTO target(a,c) SELECT 1, z {code} | This represents the update part
of merge; Line 6 |
| {code:SQL} WHERE c + 1 + z > 0 {code} | Line 5 |
| {code:SQL} AND b = y {code} | Only include ‘matched’ rows; Line 4 |
| {code:SQL} INSERT INTO target(a,b) SELECT z, 7 {code} | This represents the ‘insert’
part of merge; Line 9 |
| {code:SQL} WHERE z IS NULL {code} | Line 8 |
| {code:SQL} AND a = null AND b = null AND c = null; {code} | Only include ‘not matched’
rows; Line 7 |

h4. Some caveats
# Current multi-insert doesn’t support writing to the same table more than once.  Can we
fix this?
# This requires the same change as for multi-statement txn, that is to support multiple delta
files per transaction. (HIVE-11030)
# Requires annotating each insert (of multi-insert) with whether it’s doing update/delete
or insert


Since Hive can already compile an operator pipeline for such a _multi-insert statement_ (almost)
support for MERGE doesn't require additional operators.
Also, Update/Delete are actually compiled int Insert statements.

> add support for MERGE statement
> -------------------------------
>
>                 Key: HIVE-10924
>                 URL: https://issues.apache.org/jira/browse/HIVE-10924
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Planning, Query Processor, Transactions
>    Affects Versions: 1.2.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> add support for 
> MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message