sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkat Ramachandran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SQOOP-2387) NPE thrown when sqoop tries to import table with column name containing some special character
Date Sat, 25 Jul 2015 00:06:04 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Venkat Ramachandran updated SQOOP-2387:
    Attachment: SQOOP-2387.2.patch

Attaching another patch with all the unit tests pass (including Avro Import tests). The approach
here is different from the first patch.

Sqoop applies clean column that transforms the column names when generating ORM class and
works e2e well when the output is HDFS (text or avro).

But, it does not work when the destination is Hive/HCAT as the DDL contains the original database
column names. This patch actually uses the cleansed column names while creating DDL for Hive/HCAT.

IMO, this way the column names are consistent either in avro or Hive/HCAT (with special chars
replaced by _).

> NPE thrown when sqoop tries to import table with column name containing some special
> ----------------------------------------------------------------------------------------------
>                 Key: SQOOP-2387
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2387
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hive-integration
>    Affects Versions: 1.4.5, 1.4.6
>         Environment: HDP
>            Reporter: Pavel Benes
>            Priority: Critical
>         Attachments: SQOOP-2387.1.patch, SQOOP-2387.2.patch, SQOOP-2387.patch, joblog.txt,
> This sqoop import:
> {code}
> sqoop import --connect jdbc:mysql://some.merck.com:1234/dbname --username XXX --password
YYY --table some_table --hcatalog-database some_database --hcatalog-table some_table --hive-partition-key
mg_version --hive-partition-value 2015-05-28-13-18 -m 1 --verbose --fetch-size -2147483648
> {code}
> fails with with this error:
> {code}
> 2015-06-01 13:20:39,209 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.lang.NullPointerException
> 	at org.apache.hive.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:105)
> 	at org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper.convertToHCatRecord(SqoopHCatImportHelper.java:194)
> 	at org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:52)
> 	at org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:34)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> It seems that the error is caused by a column name containing a hyphen ('-').  Column
names are converted to java identifiers but later this converted name could not be found in
HCatalog schema.

This message was sent by Atlassian JIRA

View raw message