sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: Unknown dataset URI issues in Sqoop hive import as parquet
Date Sun, 28 Jun 2015 18:07:22 GMT
Oh that makes more sense. Seems like a format mismatch. You might have to
upgrade impala. Mind providing the version of Impala you're using?

-Abe

On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <manirajv06@gmail.com> wrote:

> actual errors are
>
> Query: select * from gwynniebee_bi.mi_test
> ERROR: AnalysisException: Failed to load metadata for table:
> gwynniebee_bi.mi_test
> CAUSED BY: TableLoadingException: Unrecognized table type for table:
> gwynniebee_bi.mi_test
>
> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <manirajv06@gmail.com>
> wrote:
>
>> It should be same as I have created many tables before in Hive and used
>> to read the same in Impala without any issues.
>>
>> I am running oozie based workflows in Production environment to take the
>> data from MySQL to HDFS (via sqoop hive imports) in raw format -> Storing
>> the same data again in Parquet format using Impala shell and on top of it,
>> reports are running using Impala queries. This is happening for few weeks
>> without any issues.
>>
>> Now, I am trying to see whether I can import the data from mySQL to
>> Impala (parquet) directly to avoid the Intermediate step.
>>
>>
>>
>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <abe@cloudera.com>
>> wrote:
>>
>>> Check your config. They should use the same metastore.
>>>
>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <manirajv06@gmail.com>
>>> wrote:
>>>
>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>
>>>> I can able to read data from Hive, but not from Impala shell. Any
>>>> workaround?
>>>>
>>>> Thanks,
>>>> Mani
>>>>
>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <abe@cloudera.com>
>>>> wrote:
>>>>
>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>
>>>>> For the datetime/timestamp issue... this is because parquet doesn't
>>>>> support timestamp types yet. Avro schemas support them as of 1.8.0
>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739. Try
>>>>> casting to a numeric or string value first?
>>>>>
>>>>> -Abe
>>>>>
>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <manirajv06@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am running
>>>>>>
>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root --password
>>>>>> gwynniebee --table bats_active --hive-import --hive-database gwynniebee_bi
>>>>>> --hive-table test_pq_bats_active --null-string '\\N' --null-non-string
>>>>>> '\\N' --as-parquetfile -m1
>>>>>>
>>>>>> and getting the below exception. I come to know from various sources
>>>>>> that $HIVE_HOME has to be set properly to avoid these kind of errors.
In my
>>>>>> case, corresponding home directory exists. But, still it is throwing
the
>>>>>> below exception.
>>>>>>
>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI patterns
in
>>>>>> org.kitesdk.data.spi.hive.Loader
>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive
datasets
>>>>>> are on the classpath.
>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive
datasets
>>>>>> are on the classpath.
>>>>>> at
>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>> at
>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>> at
>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>> at
>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>> at
>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>> at
>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>> at
>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>> at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>
>>>>>> So, I tried an alternative solution, creating an parquet file first
>>>>>> without any hive related options and creating an table referring
to the
>>>>>> same location in Impala. It worked fine. But, it is throwing the
below
>>>>>> issues ( I think it is because of date related columns).
>>>>>>
>>>>>> ERROR: File hdfs://
>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>> has an incompatible type with the table schema for column create_date.
>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>
>>>>>> Then, I tried table without datetime columns. It is working fine
in
>>>>>> this case.
>>>>>>
>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha bin.
>>>>>>
>>>>>> I would prefer first approach for my requirements. Can anyone please
>>>>>> help me in this regard?
>>>>>>
>>>>>> Thanks,
>>>>>> Mani
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message