sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: Unknown dataset URI issues in Sqoop hive import as parquet
Date Thu, 25 Jun 2015 13:57:36 GMT
Make sure HIVE_HOME and HCAT_HOME are set.

For the datetime/timestamp issue... this is because parquet doesn't support
timestamp types yet. Avro schemas support them as of 1.8.0 apparently:
https://issues.apache.org/jira/browse/AVRO-739. Try casting to a numeric or
string value first?

-Abe

On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <manirajv06@gmail.com> wrote:

> Hello,
>
> I am running
>
> ./sqoop import --connect jdbc:mysql://
> ups.db.gwynniebee.com/gwynniebee_bats --username root --password
> gwynniebee --table bats_active --hive-import --hive-database gwynniebee_bi
> --hive-table test_pq_bats_active --null-string '\\N' --null-non-string
> '\\N' --as-parquetfile -m1
>
> and getting the below exception. I come to know from various sources that
> $HIVE_HOME has to be set properly to avoid these kind of errors. In my
> case, corresponding home directory exists. But, still it is throwing the
> below exception.
>
> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI patterns in
> org.kitesdk.data.spi.hive.Loader
> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running Sqoop:
> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
> are on the classpath.
> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
> are on the classpath.
> at
> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
> at org.kitesdk.data.Datasets.create(Datasets.java:228)
> at org.kitesdk.data.Datasets.create(Datasets.java:307)
> at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
> at
> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
> at
> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
> at
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
> at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
> at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
> at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>
> So, I tried an alternative solution, creating an parquet file first
> without any hive related options and creating an table referring to the
> same location in Impala. It worked fine. But, it is throwing the below
> issues ( I think it is because of date related columns).
>
> ERROR: File hdfs://
> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
> has an incompatible type with the table schema for column create_date.
> Expected type: BYTE_ARRAY.  Actual type: INT64
>
> Then, I tried table without datetime columns. It is working fine in this
> case.
>
> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha bin.
>
> I would prefer first approach for my requirements. Can anyone please help
> me in this regard?
>
> Thanks,
> Mani
>
>
>
>

Mime
View raw message