sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Hsu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1946) DateSplitter relies on database string-to-date conversion when creating splits based on date columns
Date Wed, 20 Feb 2019 15:49:00 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773124#comment-16773124
] 

Michael Hsu commented on SQOOP-1946:
------------------------------------

It's been almost 1 month. IMHO, this is a important fix for the Oracle data source. I'm asking
the board to review it.

> DateSplitter relies on database string-to-date conversion when creating splits based
on date columns
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1946
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1946
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors/generic
>    Affects Versions: 1.4.5
>            Reporter: Andre Araujo
>            Priority: Major
>         Attachments: SQOOP-1946.patch
>
>
> When running a sqoop import job with a split-by column of data type DATE, the DateSplitter
relies on the database implicit string-to-date convertion when generating the lower and upper
bound clauses for the splits. The splits are generated with clauses similar to the following:
> {code}
> date_col >= '2013-08-26 00:00:00.0'
> date_col <= '2013-08-26 00:00:00.0'
> {code}
> This forces the database to either implcitly convert the date_col to string or implicitly
cast the literal string to a date type. In case the database default date format is not the
expected, this could lead to either exceptions, as in the example below, or to unexpected
behavior (string comparison of to dates in different string formats).
> For Oracle databases, for example, we may see the following errors when the database
cannot implicitly convert the string above to date:
> {code}
> 2014-12-23 12:38:25,690 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split:
sales_date >= '2013-08-26 00:00:00.0' AND sales_date <= '2013-08-26 00:00:00.0'
> 2014-12-23 12:38:25,745 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Working
on split: sales_date >= '2013-08-26 00:00:00.0' AND sales_date <= '2013-08-26 00:00:00.0'
> 2014-12-23 12:38:25,860 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Executing
query: select * from schema.TABLE where ( date_col >= '2013-08-26 00:00:00.0' ) AND ( date_col
<= '2013-08-26 00:00:00.0' )
> 2014-12-23 12:38:25,909 ERROR [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Top
level exception:  
> java.sql.SQLDataException: ORA-01861: literal does not match format string 
> {code}
> A workaround for that is to set the default date format for the database user used by
Sqoop to the "expected" format using a logon trigger in Oracle, like the one below:
> {code}
> CREATE OR REPLACE TRIGGER tr_a_l_set_date_format
>    AFTER LOGON ON DATABASE WHEN (USER = 'SQOOP_USER')
>    BEGIN
>      execute immediate 'alter session set nls_date_format="yyyy-mm-dd hh24:mi:ss"';
>    END;
> {code}
> A better form, though, would be to explicitly convert the date string literal to a DATE
using a specific format, in the exact same way that OracleManager.datetimeToQueryString()
does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message