sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daniel voros <daniel.vo...@gmail.com>
Subject Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables
Date Wed, 11 Apr 2018 12:04:49 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review200902
-----------------------------------------------------------



Patch #1 is an initial patch that contains the most fundamental changes to support ORC importing.
I'll add documentation and extend the tests with thridparty tests etc. but wanted to share
to get feedback early on.

- daniel voros


On April 11, 2018, 12:02 p.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> -----------------------------------------------------------
> 
> (Updated April 11, 2018, 12:02 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
>     https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID by default.
This will probably result in increased usage of ACID tables and the need to support importing
into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables would be to
write out files as ORC and keep using LOAD DATA as we do for all other Hive tables (supported
since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS from that.
This would push the responsibility of creating ORC format to Hive. However it would result
in writing every record twice; in text format and in ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. micromanaged)
ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making "lastmodified" incremental
imports work with Hive.
> 
> 
> Diffs
> -----
> 
>   ivy.xml 6be4fa2 
>   ivy/libraries.properties c44b50b 
>   src/java/org/apache/sqoop/SqoopOptions.java 651cebd 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java b7a25b7 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java a5962ba 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java b02e4fe 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c0 
>   src/java/org/apache/sqoop/tool/ImportTool.java e992005 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 8bdc3be 
>   src/test/org/apache/sqoop/orm/TestClassWriter.java 0cc07cf 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66548/diff/1/
> 
> 
> Testing
> -------
> 
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> 
> 
> Thanks,
> 
> daniel voros
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message