sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Voros (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-3311) Importing as ORC file to support full ACID Hive tables
Date Fri, 06 Apr 2018 12:41:00 GMT
Daniel Voros created SQOOP-3311:

             Summary: Importing as ORC file to support full ACID Hive tables
                 Key: SQOOP-3311
                 URL: https://issues.apache.org/jira/browse/SQOOP-3311
             Project: Sqoop
          Issue Type: New Feature
          Components: hive-integration
            Reporter: Daniel Voros
            Assignee: Daniel Voros

Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID by default.
This will probably result in increased usage of ACID tables and the need to support importing
into ACID tables with Sqoop.

Currently the only table format supporting full ACID tables is ORC.

The easiest and most effective way to support importing into these tables would be to write
out files as ORC and keep using LOAD DATA as we do for all other Hive tables (supported since

Workaround could be to create table as textfile (as before) and then CTAS from that. This
would push the responsibility of creating ORC format to Hive. However it would result in writing
every record twice; in text format and in ORC.

Note that ORC is only necessary for full ACID tables. Insert-only (aka. micromanaged) ACID
tables can use arbitrary file format.

Supporting full ACID tables would also be the first step in making "lastmodified" incremental
imports work with Hive.

This message was sent by Atlassian JIRA

View raw message