sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Saif.A.Ell...@wellsfargo.com>
Subject Help sqoop import parquet hive
Date Thu, 08 Dec 2016 14:26:13 GMT
Hello all,

I am currently struggling a lot to ingest data from Teradata into HDFS Hive in Parquet format.

1.      I was expecting sqoop to create the tables automatically, but then I get an error
of Import Hive table's column schema is missing.
2.      Instead of import, I troubleshooted and just created the table using sqoop-hive-create-table,
that works correctly. Although it does not accept --as-parquet file parameter so the hive
table is not parquet ready.
3.      So I proceed to alter the table to  change it to allow store parquet.
4.      I try the import again and this time looks like it fails because Hive table's InputFormat
class is not supported. This looks like the old parquet versions where you create the hive
table specifying the InputFormat class but don't want to go backwards in time to old parquet
/ haven't tried.
5.      Finally, I have tried just importing --as-parquetfile without any hive related to
it. And then use a Hive Table Location parameter to load the hdfs parquet content, but I get
File is not a parquet file expected magic number bla bla, having the same error if I try to
load the hdfs parquet folder from Spark

I am using all sqoop parameters there to be, from --hive-table to --hive-create-table and
read through the documentation a couple times. I don't think I have any issues in my comand
line parameters.

Any assistance welcome
Saif


Mime
View raw message