sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Szabo <mau...@apache.org>
Subject Re: Review Request 64333: Incremental import to HBase deletes only last version of column
Date Mon, 19 Feb 2018 11:28:40 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64333/#review197746
-----------------------------------------------------------


Fix it, then Ship it!




Hi,

The changes in the HBaseImportTest are not clean enough yet (please apply the requested cleanup),
but otherwise LGTM!


src/test/org/apache/sqoop/hbase/HBaseImportTest.java
Lines 122-155 (patched)
<https://reviews.apache.org/r/64333/#comment278004>

    This testcase is a bit hard to read for me.
    Nontheless it resues several local variables with different content.
    It also does not provide enough information for the first look what is happening, what
is expected, and why (why vals are set like this, what does updateTable do, why do we expect
that we do after runImport).
    
    Could you please clean this up a little bit?


- Attila Szabo


On Dec. 5, 2017, 9:25 a.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64333/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2017, 9:25 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3267
>     https://issues.apache.org/jira/browse/SQOOP-3267
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Deletes are supported since SQOOP-3149, but we're only deleting the last version of a
column when the corresponding cell was set to NULL in the source table.
> 
> This can lead to unexpected and misleading results if the row has been transferred multiple
times, which can easily happen if it's being modified on the source side.
> 
> Also SQOOP-3149 is using a new Put command for every column instead of a single Put per
row as before. This could probably lead to a performance drop for wide tables (for which HBase
is otherwise usually recommended).
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hbase-args.txt afd5c5b 
>   src/docs/user/hbase-args.txt 53040f5 
>   src/docs/user/hbase.txt ab4aedc 
>   src/java/org/apache/sqoop/SqoopOptions.java 73d0757 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 27d6006 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 0bd6169 
>   src/java/org/apache/sqoop/mapreduce/HBaseImportJob.java 33da487 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java ce21918 
>   src/test/org/apache/sqoop/TestSqoopOptions.java 16901ca 
>   src/test/org/apache/sqoop/hbase/HBaseImportTest.java 2e73cf3 
>   src/test/org/apache/sqoop/hbase/HBaseTestCase.java 98f8698 
> 
> 
> Diff: https://reviews.apache.org/r/64333/diff/2/
> 
> 
> Testing
> -------
> 
> - all unit tests and thirdparty tests passed
>  - splitted previous test for incremental import into two, to test both modes (ignore,
delete)
>  - tested on a cluster with HBase (with saved jobs as well)
> 
> 
> Thanks,
> 
> daniel voros
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message