sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkat Ramachandran (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SQOOP-2387) NPE thrown when sqoop tries to import table with column name containing some special character
Date Sat, 25 Jul 2015 00:11:05 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641272#comment-14641272
] 

Venkat Ramachandran edited comment on SQOOP-2387 at 7/25/15 12:10 AM:
----------------------------------------------------------------------

Attaching another patch with a different approach from the first patch. 
Made sure all the unit tests pass by running 
ant tests

Sqoop cleans column (transforms special chars into _) when generating ORM class.
This works fine when the destination is HDFS (either text or avro format).

But, it fails when the destination is Hive/HCAT as the generated DDL has the original database
column names where as the ORM/record reader will generate cleansed column names. This patch
uses the cleansed column names while creating DDL for Hive/HCAT.

IMO, this way the column names are consistent either in avro or Hive/HCAT (with special chars
replaced by _).
 


was (Author: me.venkatr):
Attaching another patch with all the unit tests pass (including Avro Import tests). The approach
here is different from the first patch.

Sqoop applies clean column that transforms the column names when generating ORM class and
works e2e well when the output is HDFS (text or avro).

But, it does not work when the destination is Hive/HCAT as the DDL contains the original database
column names. This patch actually uses the cleansed column names while creating DDL for Hive/HCAT.

IMO, this way the column names are consistent either in avro or Hive/HCAT (with special chars
replaced by _).
 

> NPE thrown when sqoop tries to import table with column name containing some special
character
> ----------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-2387
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2387
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hive-integration
>    Affects Versions: 1.4.5, 1.4.6
>         Environment: HDP 2.2.0.0-2041
>            Reporter: Pavel Benes
>            Priority: Critical
>         Attachments: SQOOP-2387.1.patch, SQOOP-2387.2.patch, SQOOP-2387.patch, joblog.txt,
sqoop.log
>
>
> This sqoop import:
> {code}
> sqoop import --connect jdbc:mysql://some.merck.com:1234/dbname --username XXX --password
YYY --table some_table --hcatalog-database some_database --hcatalog-table some_table --hive-partition-key
mg_version --hive-partition-value 2015-05-28-13-18 -m 1 --verbose --fetch-size -2147483648
> {code}
> fails with with this error:
> {code}
> 2015-06-01 13:20:39,209 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.lang.NullPointerException
> 	at org.apache.hive.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:105)
> 	at org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper.convertToHCatRecord(SqoopHCatImportHelper.java:194)
> 	at org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:52)
> 	at org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:34)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> It seems that the error is caused by a column name containing a hyphen ('-').  Column
names are converted to java identifiers but later this converted name could not be found in
HCatalog schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message