sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-2783) Query import with parquet fails on incompatible schema
Date Sat, 16 Jan 2016 00:01:39 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102737#comment-15102737
] 

ASF subversion and git services commented on SQOOP-2783:
--------------------------------------------------------

Commit 926d92bac13ff8171d503aa0f7b429e030284e2f in sqoop's branch refs/heads/trunk from [~kathleen]
[ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=926d92b ]

SQOOP-2783: Query import with parquet fails on incompatible schema
 (Jarek Jarcec Cecho via Kate Ting)


> Query import with parquet fails on incompatible schema
> ------------------------------------------------------
>
>                 Key: SQOOP-2783
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2783
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 1.4.7
>
>         Attachments: SQOOP-2783.patch
>
>
> This is a follow up on SQOOP-2582 where we added support for query import into parquet.
It seems that when run on a real cluster (rather then mini-cluster), the job fails with exception
similar to this one:
> {code}
> 16/01/08 09:47:13 INFO mapreduce.Job: Task Id : attempt_1452259292738_0001_m_000000_2,
Status : FAILED
> Error: org.kitesdk.data.IncompatibleSchemaException: The type cannot be used to read
from or write to the dataset:
> Type schema: {"type":"record","name":"QueryResult","fields":[{"name":"PROTOCOL_VERSION","type":"int"},{"name":"__cur_result_set","type":["null",{"type":"record","name":"ResultSet","namespace":"java.sql","fields":[]}],"default":null},{"name":"c1_int","type":["null","int"],"default":null},{"name":"c2_date","type":["null",{"type":"record","name":"Date","namespace":"java.sql","fields":[]}],"default":null},{"name":"c3_timestamp","type":["null",{"type":"record","name":"Timestamp","namespace":"java.sql","fields":[]}],"default":null},{"name":"c4_varchar20","type":["null","string"],"default":null},{"name":"__parser","type":["null",{"type":"record","name":"RecordParser","namespace":"com.cloudera.sqoop.lib","fields":[{"name":"delimiters","type":["null",{"type":"record","name":"DelimiterSet","fields":[{"name":"fieldDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"recordDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"enclosedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"escapedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"encloseRequired","type":"boolean"}]}],"default":null},{"name":"outputs","type":["null",{"type":"array","items":"string","java-class":"java.util.ArrayList"}],"default":null}]}],"default":null}]}
> Dataset schema: {"type":"record","name":"QueryResult","doc":"Sqoop import of QueryResult","fields":[{"name":"c1_int","type":["null","int"],"default":null,"columnName":"c1_int","sqlType":"4"},{"name":"c2_date","type":["null","long"],"default":null,"columnName":"c2_date","sqlType":"91"},{"name":"c3_timestamp","type":["null","long"],"default":null,"columnName":"c3_timestamp","sqlType":"93"},{"name":"c4_varchar20","type":["null","string"],"default":null,"columnName":"c4_varchar20","sqlType":"12"}],"tableName":"QueryResult"}
> 	at org.kitesdk.data.IncompatibleSchemaException.check(IncompatibleSchemaException.java:55)
> 	at org.kitesdk.data.spi.AbstractRefinableView.<init>(AbstractRefinableView.java:90)
> 	at org.kitesdk.data.spi.filesystem.FileSystemView.<init>(FileSystemView.java:71)
> 	at org.kitesdk.data.spi.filesystem.FileSystemPartitionView.<init>(FileSystemPartitionView.java:57)
> 	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:116)
> 	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:129)
> 	at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:696)
> 	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
> 	at org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40)
> 	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591)
> 	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602)
> 	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615)
> 	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> Looking into Sqoop and Kite source code I was not able to precisely identify where is
the problem. Not until I found SQOOP-1395/SQOOP-2294 that are talking about similar problem,
just for table based import. I do not clearly understand why the test added back in SQOOP-2582
is not failing, but I assume that it's due to the differences in classpath on minicluster
versus real cluster.
> I would suggest to change the avro schema name generated from {{QueryResult}} to something
more generic, such as {{AutoGeneratedSchema}} that will avoid this problem. I'm not particularly
concerned about backward compatibility here because it doesn't make much sense to depend on
name that can be generated for every single query based import.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message