sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michal Klempa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3264) Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP
Date Wed, 29 Nov 2017 09:22:00 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16270457#comment-16270457
] 

Michal Klempa commented on SQOOP-3264:
--------------------------------------

https://reviews.apache.org/r/64148/

> Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP
> ------------------------------------------------------------------------------
>
>                 Key: SQOOP-3264
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3264
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: hive-integration
>    Affects Versions: 1.4.6
>            Reporter: Michal Klempa
>            Priority: Minor
>             Fix For: 1.4.7
>
>
> When importing JDBC SQL  Types:
> {code}
>         public final static int DATE            =  91;
>         public final static int TIME            =  92;
>         public final static int TIMESTAMP       =  93;
> {code}
> Sqoop currently uses the org.apache.sqoop.hive.HiveTypes.toHiveType method, where all
of these types are mapped to STRING type.
> Given that in fact, the JDBC value returned is of type Long, let me propose we can output
the type for Hive as:
> {code}
> DATE -> TIMESTAMP
> TIME -> BIGINT
> TIMESTAMP -> TIMESTAMP
> {code}
> This is also in line with org.apache.sqoop.manager.ConnManager.toAvroType, where the
type is 
> {code}
>     case Types.DATE:
>     case Types.TIME:
>     case Types.TIMESTAMP:
>       return Type.LONG;
> {code}
> Some of the connectors override the toJavaType:
> {code}
> org.apache.sqoop.manager.SQLServerManager
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> which may indicate different handling.
> The SQLServerManager uses Java String as the output type, because of timezones.
> Same holds true for OraOopConnManager, although it has a separate configuration boolean
value 
> 'oraoop.timestamp.string' which controls whether the import will use timezones and convert
date types
> to Java String, or timezones are going to be dropped and import will behave the 'sqoop
way'.
> Both of these connectors already handle these types as String by default, proposed change
would not affect them.
> Other connectors are needed to be checked.
> Some of the connectors override the toHiveType:
> {code}
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> This connector uses the 'sqoop way':
> {code}
>     String hiveType = super.toHiveType(sqlType);
> {code}
> and only when not resolved, the type used is decided:
> {code}
>     if (hiveType == null) {
>       // http://wiki.apache.org/hadoop/Hive/Tutorial#Primitive_Types
>       if (sqlType == OraOopOracleQueries.getOracleType("BFILE")
>           || sqlType == OraOopOracleQueries.getOracleType("INTERVALYM")
>           || sqlType == OraOopOracleQueries.getOracleType("INTERVALDS")
>           || sqlType == OraOopOracleQueries.getOracleType("NCLOB")
>           || sqlType == OraOopOracleQueries.getOracleType("NCHAR")
>           || sqlType == OraOopOracleQueries.getOracleType("NVARCHAR")
>           || sqlType == OraOopOracleQueries.getOracleType("OTHER")
>           || sqlType == OraOopOracleQueries.getOracleType("ROWID")
>           || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPTZ")
>           || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPLTZ")
>           || sqlType == OraOopOracleQueries.getOracleType("STRUCT")) {
>         hiveType = "STRING";
>       }
>       if (sqlType == OraOopOracleQueries.getOracleType("BINARY_FLOAT")) {
>         hiveType = "FLOAT";
>       }
>       if (sqlType == OraOopOracleQueries.getOracleType("BINARY_DOUBLE")) {
>         hiveType = "DOUBLE";
>       }
>     }
> {code}
> This code is affected with proposed change. As the Hive TIMESTAMP is timezone-less, we
have to change the handling in this method - respect the property 'oraoop.timestamp.string'
- if true, output STRING hive type, if false, go with 'sqoop way'.
> The Hive Type is only used when generating the table ddl (create statement) and Hive
can properly recognize the  JDBC compliant java.sql.Timestamp format "YYYY-MM-DD HH:MM:SS.fffffffff",
so no connector should be affected in a way, that Hive would not read the resulting column
values.
> However, thorough testing should be done on all connectors before releasing any column
type behavior changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message