sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szabolcs Vasas <vasas.szabo...@gmail.com>
Subject Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables
Date Thu, 24 May 2018 13:07:14 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review203762
-----------------------------------------------------------



Hi Dani,

Thank you for submitting this new feature and sorry for the late review.
I have left some minor comments inline and I suggest adding some more documentation explaining
what exactly is supported with the ORC files.
The implementation suggests that we only support HDFS and Hive import at this moment, so export
is not covered yet. If this is true I think we should emphasize it in the documentation.


src/java/org/apache/sqoop/hive/TableDefWriter.java
Lines 194 (patched)
<https://reviews.apache.org/r/66548/#comment286060>

    I am not that familiar with Hive CREATE TABLE statement but as far as I understand 'STORED
AS ORC' basically means that we will use org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, org.apache.hadoop.hive.ql.io.orc.OrcSerde
is that correct?



src/java/org/apache/sqoop/hive/TableDefWriter.java
Line 197 (original), 200 (patched)
<https://reviews.apache.org/r/66548/#comment286061>

    Does ORC support different compression codecs? If yes, I think we should emphasize in
the documentation (and/or implement a fail fast) that Sqoop does not support the compression-codec
option with ORC files at the moment.



src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java
Lines 41 (patched)
<https://reviews.apache.org/r/66548/#comment286065>

    Can we just use NullWritable.get() instead of introducing a field called "nada"?



src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java
Lines 42 (patched)
<https://reviews.apache.org/r/66548/#comment286064>

    Can we make this field private?



src/java/org/apache/sqoop/util/OrcConversionContext.java
Lines 98 (patched)
<https://reviews.apache.org/r/66548/#comment286063>

    typo: tinyiny



src/java/org/apache/sqoop/util/OrcUtil.java
Lines 55 (patched)
<https://reviews.apache.org/r/66548/#comment286062>

    We use Hive types as ORC schema types here, is this always going to be correct?
    I am not too familiar with the ORC type, does it support all the Hive data types?



src/test/org/apache/sqoop/TestOrcImport.java
Lines 50 (patched)
<https://reviews.apache.org/r/66548/#comment286066>

    It seems that this test case only covers the HDFS import, can we add test cases which
cover the Hive import too?


- Szabolcs Vasas


On May 2, 2018, 12:12 p.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> -----------------------------------------------------------
> 
> (Updated May 2, 2018, 12:12 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
>     https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID by default.
This will probably result in increased usage of ACID tables and the need to support importing
into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables would be to
write out files as ORC and keep using LOAD DATA as we do for all other Hive tables (supported
since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS from that.
This would push the responsibility of creating ORC format to Hive. However it would result
in writing every record twice; in text format and in ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. micromanaged)
ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making "lastmodified" incremental
imports work with Hive.
> 
> 
> Diffs
> -----
> 
>   ivy.xml 1f587f3e 
>   ivy/libraries.properties 565a8bf5 
>   src/docs/man/import-common-args.txt 22e3448e 
>   src/docs/man/sqoop-import-all-tables.txt 6db38ad8 
>   src/docs/user/hcatalog.txt 2ae1d54d 
>   src/docs/user/help.txt 8a0d1477 
>   src/docs/user/import-all-tables.txt fbad47b2 
>   src/docs/user/import.txt 2d074f49 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba4 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 783651a4 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
>   src/java/org/apache/sqoop/tool/ImportTool.java ee79d8b7 
>   src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestAllTables.java 16933a82 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f64 
>   src/test/org/apache/sqoop/util/TestOrcConversionContext.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66548/diff/8/
> 
> 
> Testing
> -------
> 
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> 
> 
> Thanks,
> 
> daniel voros
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message