sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Lanza García (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1600) Exception when import data using Data Connector for Oracle with TIMESTAMP column type to Parquet files
Date Tue, 18 Nov 2014 10:11:34 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216025#comment-14216025

Daniel Lanza García commented on SQOOP-1600:

If you want to write timestamp fields in the format that Impala reads timestamp in Parquet
file, you should implement it because It is not implemented yet.

However you are lucky, I have already implemented it and its available in my GitHub repository
(https://github.com/dlanza1). You should clone and compile my three repos and add these to
the classpath.

Another easier option is cast bigint to timestamp in the following way: select cast((hiredate
/ 1000) as TIMESTAMP) from aaa;
The problem is you have to cast in every query, if you do not have a lot of data, you can
generate a new table with the INSERT... SELECT statement using casting to generate a table
with a column of timestamp type.

I hope it helps you.

> Exception when import data using Data Connector for Oracle with TIMESTAMP column type
to Parquet files
> ------------------------------------------------------------------------------------------------------
>                 Key: SQOOP-1600
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1600
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>         Environment: Hadoop version: 2.5.0-cdh5.2.0
> Sqoop: 1.4.5
>            Reporter: Daniel Lanza García
>              Labels: Connector, Oracle, Parquet, Timestamp
>             Fix For: 1.4.6
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> A error is thrown in each mapper when a import job is run using Quest data connector
for Oracle (-direct argument), the source table has a column of the type timestamp and the
destination files are of Parquet format.
> The mapper's log show that the error is the following:
> WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.avro.UnresolvedUnionException:
Not in union ["long","null"]: 2012-7-1 0:4:44. 403000000
> Which means the data obtained by the mapper (by the connector) is not of the same type
that the schema describe in this field. As we can read in the error, the problem is related
with the column UTC_STAMP (the unique column in the source table that store a time stamp).
> If we check the generated schema for this column, we can observe that the column is of
the type long and SQL data type TIMESTAMP (93), which is correct.
> Schema: {"name" : "UTC_STAMP","type" : [ "long", "null" ],"columnName" : "UTC_STAMP","sqlType"
: "93"}
> If we debug the method where the exception is thrown (org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:605)),
we can see that the problem comes when the type of the data obtained by the mapper is of the
type String which doesn't correspond with the type described by the schema (long).
> The exception is not thrown when the destination files are text files. The reason is
that when you import to text files, a schema is not generated.
> Solution
> In the documentation, there is a section which describe how manage data and timestamps
when you use the Data Connector for Oracle and Hadoop. As we can read in this section, this
connector has a different way to manage this type of data. However, this behavior can be disabled
as describe this section with the below parameter.
> -Doraoop.timestamp.string=false
> Although the problem is solved with this parameter (mandatory if you are in this conditions),
the software should deal with this types of column and doesn't throw an exception.

This message was sent by Atlassian JIRA

View raw message