sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manikandan R <maniraj...@gmail.com>
Subject Re: Unknown dataset URI issues in Sqoop hive import as parquet
Date Fri, 26 Jun 2015 07:51:45 GMT
It should be same as I have created many tables before in Hive and used to
read the same in Impala without any issues.

I am running oozie based workflows in Production environment to take the
data from MySQL to HDFS (via sqoop hive imports) in raw format -> Storing
the same data again in Parquet format using Impala shell and on top of it,
reports are running using Impala queries. This is happening for few weeks
without any issues.

Now, I am trying to see whether I can import the data from mySQL to Impala
(parquet) directly to avoid the Intermediate step.



On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <abe@cloudera.com> wrote:

> Check your config. They should use the same metastore.
>
> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <manirajv06@gmail.com>
> wrote:
>
>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>
>> I can able to read data from Hive, but not from Impala shell. Any
>> workaround?
>>
>> Thanks,
>> Mani
>>
>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <abe@cloudera.com>
>> wrote:
>>
>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>
>>> For the datetime/timestamp issue... this is because parquet doesn't
>>> support timestamp types yet. Avro schemas support them as of 1.8.0
>>> apparently: https://issues.apache.org/jira/browse/AVRO-739. Try casting
>>> to a numeric or string value first?
>>>
>>> -Abe
>>>
>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <manirajv06@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am running
>>>>
>>>> ./sqoop import --connect jdbc:mysql://
>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root --password
>>>> gwynniebee --table bats_active --hive-import --hive-database gwynniebee_bi
>>>> --hive-table test_pq_bats_active --null-string '\\N' --null-non-string
>>>> '\\N' --as-parquetfile -m1
>>>>
>>>> and getting the below exception. I come to know from various sources
>>>> that $HIVE_HOME has to be set properly to avoid these kind of errors. In
my
>>>> case, corresponding home directory exists. But, still it is throwing the
>>>> below exception.
>>>>
>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI patterns in
>>>> org.kitesdk.data.spi.hive.Loader
>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>> are on the classpath.
>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>> are on the classpath.
>>>> at
>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>> at
>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>> at
>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>> at
>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>> at
>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>> at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>> at
>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>> at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>
>>>> So, I tried an alternative solution, creating an parquet file first
>>>> without any hive related options and creating an table referring to the
>>>> same location in Impala. It worked fine. But, it is throwing the below
>>>> issues ( I think it is because of date related columns).
>>>>
>>>> ERROR: File hdfs://
>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>> has an incompatible type with the table schema for column create_date.
>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>
>>>> Then, I tried table without datetime columns. It is working fine in
>>>> this case.
>>>>
>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha bin.
>>>>
>>>> I would prefer first approach for my requirements. Can anyone please
>>>> help me in this regard?
>>>>
>>>> Thanks,
>>>> Mani
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message