sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhaohong (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SQOOP-801) Missing data when Importing data split by the column of the date type from oracle
Date Wed, 09 Oct 2013 04:40:43 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhaohong resolved SQOOP-801.
----------------------------

    Resolution: Fixed

> Missing data  when Importing data split by  the column of the date type from oracle 

> -------------------------------------------------------------------------------------
>
>                 Key: SQOOP-801
>                 URL: https://issues.apache.org/jira/browse/SQOOP-801
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.2
>         Environment: source dababase: oracle
> os: centos 5.4
> hadoop version: 0.20.2-cdh3u2
>            Reporter: zhaohong
>              Labels: patch
>
> The command is below:
> ============================
> sqoop import --connect 'jdbc:oracle:thin:@???.???.???.???:1521:staging1' --username v_stage
--password 'v_stage' 
> --table shoppingcart 
> --columns 'trace_id,checkout_status ,entered_price,promotion_id ,permanent_id ,from_platform,points,shop_id
,creation_date,productid ,category_path,data_date ,cust_id ,cart_id ,last_changed_date ,order_from_ip,counts,reference_url,cart_item_id
,special_sale' 
> --where "last_changed_date >= TO_DATE('2012-12-11','yyyy-mm-dd') and last_changed_date
< TO_DATE('2012-12-12','yyyy-mm-dd')" 
> --fields-terminated-by '' 
> --lines-terminated-by '
> ' --split-by last_changed_date 
> --target-dir /share/comm/customer/zhao/ 
> --hive-delims-replacement ' ' 
> -m 20
> ============================
> The DataBase is Oracle and the type of last_changed_date is date. 
> There are 600010 records between 2012-12-11 and 2012-12-12  but I can only pull about
12 records using this command.
> the BoundingValsQuery is select min(last_change_date), max(last_change_date)
> from shoppingcart where last_chage_date >=TO_DATA('2012-12-11','yyyy-mm-dd') and last_chage_date<TO_DATA('2012-12-12','yyyy-mm-dd')
> I debuged the source code and found that the min and max val are both equal to 2012-12-11
and The query condition look like : 
> where last_chage_date>=TO_TIMESTAMP(‘2012-12-11’,‘YYYY-MM-DD’) and last_chage_date<=TO_TIMESTAMP(‘2012-12-11’,‘YYYY-MM-DD’)
that's why I can only get 12 records whose he last_chage_date is equal to 2012-12-11 00:00:00
exactly.
> I do some modification and the result is right.
>  diff --git a/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java b/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java
> index 31e9351..812c6bb 100644
> --- a/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java
> +++ b/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java
> @@ -140,7 +144,7 @@ public class DateSplitter extends IntegerSplitter {
>      try {
>        switch (sqlDataType) {
>        case Types.DATE:
> -        return rs.getDate(colNum).getTime();
> +       return rs.getTimestamp(colNum).getTime();
>        case Types.TIME:
>          return rs.getTime(colNum).getTime();
>        case Types.TIMESTAMP:
> @@ -160,7 +164,7 @@ public class DateSplitter extends IntegerSplitter {
>    private Date longToDate(long val, int sqlDataType) {
>      switch (sqlDataType) {
>      case Types.DATE:
> -      return new java.sql.Date(val);
> +      return new java.sql.Timestamp(val);
>      case Types.TIME:
>        return new java.sql.Time(val);
>      case Types.TIMESTAMP:
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message