sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Kemper (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-3068) Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java)
Date Mon, 05 Dec 2016 10:18:58 GMT
Markus Kemper created SQOOP-3068:
------------------------------------

             Summary: Enhance error (tool.ImportTool: Encountered IOException running import
job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java)
                 Key: SQOOP-3068
                 URL: https://issues.apache.org/jira/browse/SQOOP-3068
             Project: Sqoop
          Issue Type: Improvement
          Components: hive-integration
    Affects Versions: 1.4.6
            Reporter: Markus Kemper


Please consider enhancing the error to include more detail and suggest workaround (--map-columns-java).
 

Sqoop (import + --hive-import + --as-parquetfile) can fail due to a mismatch with the json
schema that Hive produces vs. the json schema that Sqoop generates.  The test case below demonstrates
how to reproduce the issue as well as workaround it.

SETUP (create parquet table with Sqoop import and Beeline CTAS)
{noformat}
STEP 01 - Create MySQL Tables

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1
(c_int int, c_date date, c_timestamp timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "describe t1"
---------------------------------------------------------------------------------------------------------
| Field                | Type                 | Null | Key | Default              | Extra
               | 
---------------------------------------------------------------------------------------------------------
| c_int                | int(11)              | YES |     | (null)               |       
              | 
| c_date               | date                 | YES |     | (null)               |       
              | 
| c_timestamp          | timestamp            | NO  |     | CURRENT_TIMESTAMP    | on update
CURRENT_TIMESTAMP | 
---------------------------------------------------------------------------------------------------------

STEP 02 : Insert and Select Row

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1
values (1, current_date(), current_timestamp())"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from
t1"
--------------------------------------------------
| c_int       | c_date     | c_timestamp         | 
--------------------------------------------------
| 1           | 2016-10-26 | 2016-10-26 14:30:33.0 | 
--------------------------------------------------

STEP 03 : Create Hive Tables

beeline -u jdbc:hive2:// -e "use default; drop table t1"
sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password
$MYPSWD --table t1 --hcatalog-database default --hcatalog-table t1_text --create-hcatalog-table
--hcatalog-storage-stanza 'stored as parquet' --num-mappers 1
beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet as select * from
t1_text;show create table t1;"

+----------------------------------------------------+--+
|                   createtab_stmt                   |
+----------------------------------------------------+--+
| CREATE TABLE `t1`(                                 |
|   `c_int` int,                                     |
|   `c_date` string,                                 |
|   `c_timestamp` string)                            |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION                                           |
|   'hdfs://<namenode>:8020/user/hive/warehouse/t1' |
| TBLPROPERTIES (                                    |
|   'COLUMN_STATS_ACCURATE'='true',                  |
|   'numFiles'='1',                                  |
|   'numRows'='2',                                   |
|   'rawDataSize'='6',                               |
|   'totalSize'='605',                               |
|   'transient_lastDdlTime'='1478298298')            |
+----------------------------------------------------+--+
{noformat}

REPRODUCE ISSUE (import --hive-import append and overwrite)
{noformat}

STEP 01: Attempt --hive-import --append
	
sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password
$MYPSWD --table t1 --hive-import --hive-database default --hive-table t1 --as-parquetfile
--num-mappers 1 --append

16/11/02 20:54:23 WARN mapreduce.DataDrivenImportJob: Target Hive table 't1' exists! Sqoop
will append data into the existing Hive table. Consider using --hive-overwrite, if you do
NOT intend to do appending.
16/11/02 20:54:24 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException:
Expected schema: {"type":"record","name":"t1","fields":[{"name":"c_int","type":["null","int"],"doc":"Converted
from 'int'","default":null},{"name":"c_date","type":["null","string"],"doc":"Converted from
'string'","default":null},{"name":"c_timestamp","type":["null","string"],"doc":"Converted
from 'string'","default":null}]}
Actual schema: {"type":"record","name":"t1","doc":"Sqoop import of t1","fields":[{"name":"c_int","type":["null","int"],"default":null,"columnName":"c_int","sqlType":"4"},{"name":"c_date","type":["null","long"],"default":null,"columnName":"c_date","sqlType":"91"},{"name":"c_timestamp","type":["null","long"],"default":null,"columnName":"c_timestamp","sqlType":"93"}],"tableName":"t1"}

STEP 02: Attempt --hive-import --hive-overwrite

sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password
$MYPSWD --table t1 --hive-import --hive-database default --hive-table t1 --as-parquetfile
--num-mappers 1 --hive-overwrite

16/11/02 20:56:55 INFO hive.metastore: Connected to metastore.
16/11/02 20:56:56 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException:
Expected schema: {"type":"record","name":"t1","fields":[{"name":"c_int","type":["null","int"],"doc":"Converted
from 'int'","default":null},{"name":"c_date","type":["null","string"],"doc":"Converted from
'string'","default":null},{"name":"c_timestamp","type":["null","string"],"doc":"Converted
from 'string'","default":null}]}
Actual schema: {"type":"record","name":"t1","doc":"Sqoop import of t1","fields":[{"name":"c_int","type":["null","int"],"default":null,"columnName":"c_int","sqlType":"4"},{"name":"c_date","type":["null","long"],"default":null,"columnName":"c_date","sqlType":"91"},{"name":"c_timestamp","type":["null","long"],"default":null,"columnName":"c_timestamp","sqlType":"93"}],"tableName":"t1"}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message