sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alwin James (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3147) Import data to Hive Table in S3 in Parquet format
Date Fri, 21 Apr 2017 00:24:04 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977856#comment-15977856
] 

Alwin James commented on SQOOP-3147:
------------------------------------

[~akamal] For the issue 'Another issue that I noticed is that Sqoop loads the Avro schema
in TBLProperties under avro.schema.literal attribute and if the table has a lot of columns,
the schema would be truncated and this would cause a weird exception like this one.'

What is the hive metastore backend used? 
It could well be the case were it is hitting limitation on the number of characters that can
be stored in TBLPROPERTIES in hive metastore.
By default it is 4000, you can try increasing the value.

> Import data to Hive Table in S3 in Parquet format
> -------------------------------------------------
>
>                 Key: SQOOP-3147
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3147
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>            Reporter: Ahmed Kamal
>
> Using this command succeeds only if the Hive Table's location is HDFS. If the table is
backed by S3 it throws an exception while trying to move the data from HDFS tmp directory
to S3
> Job job_1486539699686_3090 failed with state FAILED due to: Job commit failed: org.kitesdk.data.DatasetIOException:
Dataset merge failed
> 	at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:333)
> 	at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:56)
> 	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.commitJob(DatasetKeyOutputFormat.java:370)
> 	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
> 	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Dataset merge failed during rename of hdfs://hdfs-path/tmp/dev_kamal/.temp/job_1486539699686_3090/mr/job_1486539699686_3090/0192f987-bd4c-4cb7-836f-562ac483e008.parquet
to s3://bucket_name/dev_kamal/address/0192f987-bd4c-4cb7-836f-562ac483e008.parquet
> 	at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:329)
> 	... 7 more
> sqoop import  --connect "jdbc:mysql://connectionUrl"   --table "tableName" --as-parquetfile
--verbose  --username=uname --password=pass --hive-import  --delete-target-dir --hive-database
dev_kamal --hive-table  customer_car_type --hive-overwrite -m 150
> Another issue that I noticed is that Sqoop loads the Avro schema in TBLProperties under
avro.schema.literal attribute and if the table has a lot of columns, the schema would be truncated
and this would cause a weird exception like this one.
> *Exception :*
> 17/03/07 12:13:13 INFO hive.metastore: Trying to connect to metastore with URI thrift://ip-10-0-0-47.eu-west-1.compute.internal:9083
> 17/03/07 12:13:13 INFO hive.metastore: Opened a connection to metastore, current connections:
1
> 17/03/07 12:13:13 INFO hive.metastore: Connected to metastore.
> 17/03/07 12:13:17 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@3e9b1010
> 17/03/07 12:13:17 ERROR sqoop.Sqoop: Got exception running Sqoop: org.apache.avro.SchemaParseException:
org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote
for a string value
>  at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001]
> org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected
end-of-input: was expecting closing quote for a string value
>  at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001]
> 	at org.apache.avro.Schema$Parser.parse(Schema.java:929)
> 	at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> 	at org.kitesdk.data.DatasetDescriptor$Builder.schemaLiteral(DatasetDescriptor.java:475)
> 	at org.kitesdk.data.spi.hive.HiveUtils.descriptorForTable(HiveUtils.java:154)
> 	at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:104)
> 	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:192)
> 	at org.kitesdk.data.Datasets.load(Datasets.java:108)
> 	at org.kitesdk.data.Datasets.load(Datasets.java:165)
> 	at org.kitesdk.data.Datasets.load(Datasets.java:187)
> 	at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:78)
> 	at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
> 	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
> 	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
> 	at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
> 	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
> 	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
> 	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
> 	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
> 	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
> 	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
> Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting
closing quote for a string value
>  at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001]
> 	at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> 	at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> 	at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:454)
> 	at org.codehaus.jackson.impl.ReaderBasedParser._finishString2(ReaderBasedParser.java:1342)
> 	at org.codehaus.jackson.impl.ReaderBasedParser._finishString(ReaderBasedParser.java:1330)
> 	at org.codehaus.jackson.impl.ReaderBasedParser.getText(ReaderBasedParser.java:200)
> 	at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:203)
> 	at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:224)
> 	at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:200)
> 	at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58)
> 	at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
> 	at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2704)
> 	at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1344)
> 	at org.apache.avro.Schema$Parser.parse(Schema.java:927)
> 	... 21 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message