sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3267) Incremental import to HBase deletes only last version of column
Date Thu, 22 Feb 2018 14:35:00 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372867#comment-16372867
] 

Hudson commented on SQOOP-3267:
-------------------------------

SUCCESS: Integrated in Jenkins build Sqoop-hadoop200 #1149 (See [https://builds.apache.org/job/Sqoop-hadoop200/1149/])
SQOOP-3267: Incremental import to HBase deletes only last version of (vasas: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=69463f0b3ed3af28581202ef59079b9df7bc0bad])
* (edit) src/docs/user/hbase.txt
* (edit) src/java/org/apache/sqoop/hbase/HBasePutProcessor.java
* (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java
* (edit) src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java
* (edit) src/docs/man/hbase-args.txt
* (edit) src/java/org/apache/sqoop/SqoopOptions.java
* (edit) src/test/org/apache/sqoop/hbase/HBaseTestCase.java
* (edit) src/java/org/apache/sqoop/mapreduce/HBaseImportJob.java
* (edit) src/test/org/apache/sqoop/hbase/HBaseImportTest.java
* (edit) src/docs/user/hbase-args.txt
* (edit) src/test/org/apache/sqoop/TestSqoopOptions.java


> Incremental import to HBase deletes only last version of column
> ---------------------------------------------------------------
>
>                 Key: SQOOP-3267
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3267
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hbase-integration
>    Affects Versions: 1.4.7
>            Reporter: Daniel Voros
>            Assignee: Daniel Voros
>            Priority: Major
>             Fix For: 1.5.0
>
>         Attachments: SQOOP-3267.1.patch, SQOOP-3267.2.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last version of a
column when the corresponding cell was set to NULL in the source table.
> This can lead to unexpected and misleading results if the row has been transferred multiple
times, which can easily happen if it's being modified on the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a single Put per
row as before. This could probably lead to a performance drop for wide tables (for which HBase
is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be the expected
behavior here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message