sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Szabo <mau...@apache.org>
Subject Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables
Date Fri, 11 May 2018 00:48:14 GMT

This is an automatically generated e-mail. To reply, visit:

Hi Daniel,

I did not have time for a full review yet, though already found a few smaller flaws, and one
conceptional one.

I'd strongly suggest to extract ORC reading/writing only as a subtask as ACID Hive tables.
The clean advantages would be:
We would be able to test ORC file import/export without any Hive related changes, and thus
testing the correctness with PrestoDB, or any other tool which can read read/write ORC files.
What do you think?

Smaller ones:
See inline.


Lines 144-149 (patched)

    Are we sure this is the total exclude list what we need, and no Hive or other library
related "debris" are coming with it?

Lines 62 (patched)

    Shouldn't we aim for the nohive version of 1.4.3?

Lines 47-51 (patched)

    Aren't we missing a super call here?

Lines 53 (patched)

    Occording to the general code guidelines and Joschua Bloch (Effective Java) this is a
very dangerous practice to initialize an object in a non-final initializer. 
    Please correct this!

- Attila Szabo

On May 2, 2018, 12:12 p.m., daniel voros wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> -----------------------------------------------------------
> (Updated May 2, 2018, 12:12 p.m.)
> Review request for Sqoop.
> Bugs: SQOOP-3311
>     https://issues.apache.org/jira/browse/SQOOP-3311
> Repository: sqoop-trunk
> Description
> -------
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID by default.
This will probably result in increased usage of ACID tables and the need to support importing
into ACID tables with Sqoop.
> Currently the only table format supporting full ACID tables is ORC.
> The easiest and most effective way to support importing into these tables would be to
write out files as ORC and keep using LOAD DATA as we do for all other Hive tables (supported
since HIVE-17361).
> Workaround could be to create table as textfile (as before) and then CTAS from that.
This would push the responsibility of creating ORC format to Hive. However it would result
in writing every record twice; in text format and in ORC.
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. micromanaged)
ACID tables can use arbitrary file format.
> Supporting full ACID tables would also be the first step in making "lastmodified" incremental
imports work with Hive.
> Diffs
> -----
>   ivy.xml 6af94d9d 
>   ivy/libraries.properties c44b50bc 
>   src/docs/man/import-common-args.txt 22e3448e 
>   src/docs/man/sqoop-import-all-tables.txt 6db38ad8 
>   src/docs/user/hcatalog.txt 2ae1d54d 
>   src/docs/user/help.txt 8a0d1477 
>   src/docs/user/import-all-tables.txt fbad47b2 
>   src/docs/user/import.txt 2d074f49 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba4 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 783651a4 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
>   src/java/org/apache/sqoop/tool/ImportTool.java ee79d8b7 
>   src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestAllTables.java 56d1f577 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f64 
>   src/test/org/apache/sqoop/util/TestOrcConversionContext.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> Diff: https://reviews.apache.org/r/66548/diff/7/
> Testing
> -------
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> Thanks,
> daniel voros

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message