sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Szabo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column
Date Wed, 06 Dec 2017 01:20:00 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279496#comment-16279496
] 

Attila Szabo commented on SQOOP-3267:
-------------------------------------

Hey [~dvoros],

I've checked your proposed changes, and become a bit concerned:
According to your patch you would not differentiate Sqoop's behavior depending on cmd line
argument mode.

I do understand in case of "lastmodified" mode it would make sense to delete all the previous
versions of the column (though I guess in this case we should delete previous column values
even in case of the "normal" updates).

But IMHO in case of "append" mode, we should only delete the last version of the column, to
keep the history, as that is suggested by the mode itself.

What do you think?
[~maugli]

> Incremental import to HBase deletes only last version of column
> ---------------------------------------------------------------
>
>                 Key: SQOOP-3267
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3267
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hbase-integration
>    Affects Versions: 1.4.7
>            Reporter: Daniel Voros
>            Assignee: Daniel Voros
>         Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last version of a
column when the corresponding cell was set to NULL in the source table.
> This can lead to unexpected and misleading results if the row has been transferred multiple
times, which can easily happen if it's being modified on the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a single Put per
row as before. This could probably lead to a performance drop for wide tables (for which HBase
is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be the expected
behavior here?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message