sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shreyas Joshi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-3242) Import from MySQL to Hive table failing
Date Thu, 05 Oct 2017 19:13:00 GMT
Shreyas Joshi created SQOOP-3242:
------------------------------------

             Summary: Import from MySQL to Hive table failing
                 Key: SQOOP-3242
                 URL: https://issues.apache.org/jira/browse/SQOOP-3242
             Project: Sqoop
          Issue Type: Bug
         Environment: Sqoop version 1.4.6
MySQL version 5.7.19
Hive version 1.2.0
Debian Jessie

            Reporter: Shreyas Joshi


*Problem description:*

We are importing data from MySQL into ORC tables. A few days ago the import process started
failing, and has been failing since. 

It's always the first mapper that fails. The stack trace from the output of the failed map
job copying all the columns from the mysql the table is:

{noformat}
2017-10-05 10:08:44,794 WARN [DataStreamer for file /some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000]
org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000
(inode 100779406): File does not exist. Holder DFSClient_attempt_1496703039885_122293_m_000000_0_1190864689_1
does not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3313)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3169)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

	at org.apache.hadoop.ipc.Client.call(Client.java:1468)
	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
2017-10-05 10:10:43,950 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring exception during
close for org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@362018fd
java.lang.IllegalArgumentException: Column has wrong number of index entries found: 0 expected:
822
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:802)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1742)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2133)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2425)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
	at org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
	at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2012)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-10-05 10:10:43,950 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000
(inode 100779406): File does not exist. Holder DFSClient_attempt_1496703039885_122293_m_000000_0_1190864689_1
does not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3313)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3169)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

	at org.apache.hadoop.ipc.Client.call(Client.java:1468)
	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
{noformat}

*Source table:*
Here's what the source table in mysql looks like:

{noformat}
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| id             | bigint(11)       | NO   | PRI | NULL    | auto_increment |
| user_id        | int(11) unsigned | YES  | MUL | NULL    |                |
| repository_id  | int(11) unsigned | YES  | MUL | NULL    |                |
| commit_count   | int(11) unsigned | YES  |     | 0       |                |
| committed_date | date             | YES  |     | NULL    |                |
| created_at     | datetime         | YES  |     | NULL    |                |
| updated_at     | datetime         | YES  |     | NULL    |                |
+----------------+------------------+------+-----+---------+----------------+
{noformat}

*Destination table:*

Here's what the destination table in hive looks like:

{noformat}
CREATE TABLE `test_db.test_table`(
  `id` bigint,
  `user_id` bigint,
  `repository_id` bigint,
  `commit_count` bigint,
  `committed_date` date,
  `created_at` timestamp,
  `updated_at` timestamp)
PARTITIONED BY (
  `snapshot_date` string,
  `snapshot_version` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
  'transactional'='true'
)
{noformat}

*More info:*

If the table in Hive is updated and only the {{id}} column is imported, the import is successful.
However, when the {{commit_count}} column is added (ie now importing {{id}} and {{commit_count}})
the import fails. The failure is slightly different from above:

{noformat}
2017-10-05 10:44:11,862 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring exception during
close for org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@6f77bb11
java.nio.channels.ClosedChannelException
	at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622)
	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
	at java.io.DataOutputStream.write(DataOutputStream.java:107)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl$DirectStream.output(WriterImpl.java:464)
	at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:242)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeMetadata(WriterImpl.java:2328)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2426)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
	at org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
	at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2012)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-10-05 10:44:11,863 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /some_path/_SCRATCH0.2200603632389685/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122336_m_000000_0/part-m-00000
(inode 100782116): File does not exist. Holder DFSClient_attempt_1496703039885_122336_m_000000_0_1708047031_1
does not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3604)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3574)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:700)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

	at org.apache.hadoop.ipc.Client.call(Client.java:1468)
	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at com.sun.proxy.$Proxy12.complete(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:443)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.complete(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2250)
	at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2234)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2429)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
	at org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{noformat}

* We seem to have better luck with a single mapper, but this takes too long so is not a realistic
solution.
* We tried disabling speculative execution, that didn't work either.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message