sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boglarka Egyed (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column
Date Tue, 05 Dec 2017 08:55:00 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278220#comment-16278220

Boglarka Egyed commented on SQOOP-3267:

Hi [~dvoros],

Thanks for reporting this issue!

Currently we use [Apache's Review Board|https://reviews.apache.org/r/59833/] for reviewing
patches so please open a Review Request.

Please consider the followings on Review Board:
* Project: Sqoop
* Summary: generate your summary using the issue's JIRA key + JIRA title
* Groups: add the relevant group so everyone on the project will know about your patch (sqoop)
* Bugs: add the issue's JIRA key so it's easy to navigate to the JIRA side
* Repository: sqoop-trunk for Sqoop1 (or sqoop-sqoop2 for Sqoop2)

Please add the link of the review as an external/web link.

If you would like to assign this ticket to yourself please let me know and I'll add you to
the Contributors list.

Thank you,

> Incremental import to HBase deletes only last version of column
> ---------------------------------------------------------------
>                 Key: SQOOP-3267
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3267
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hbase-integration
>    Affects Versions: 1.4.7
>            Reporter: Daniel Voros
>         Attachments: SQOOP-3267.1.patch
> Deletes are supported since SQOOP-3149, but we're only deleting the last version of a
column when the corresponding cell was set to NULL in the source table.
> This can lead to unexpected and misleading results if the row has been transferred multiple
times, which can easily happen if it's being modified on the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a single Put per
row as before. This could probably lead to a performance drop for wide tables (for which HBase
is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be the expected
behavior here?

This message was sent by Atlassian JIRA

View raw message