sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Verkerk <christ...@tubularlabs.com>
Subject mysqlimport terminated with error code 1
Date Mon, 15 Sep 2014 14:14:43 GMT
Hi,

I'm trying to run a sqoop export for a large dataset (at least 1B
rows) with the following sqoop export call:

sqoop export --direct \
--connect <host> \
--table <table> \
--export-dir /user/hive/warehouse/<table> \
--num-mappers 8 \
--username <username> \
--password <password> \
--input-fields-terminated-by ',' \
--verbose

Behind the scenes, I've found that sqoop export does what you'd expect
it to: it farms out the work to a (num-mappers) number of different
nodes with a NodeManager role, gets about 32MB worth of HDFS data into
a temp file on each of the nodes and sends it along to mysqlimport
which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
MySQL table.

The following error occurs depending on the level of parallelism used
(via num-mappers), that is, 2 mappers doesn't trigger it but 10
definitely will:

14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
17:34:27 INFO mapreduce.Job: Task Id :
attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
java.io.IOException: mysqlimport terminated with error code 1 at
org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
at org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

I understand there is some limit to the level of parallelism that can
be achieved in the job -- mysqld can get tied up processing too many
things at once etc. but I'd like to know how to turn the debugging on
for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
can actually see the mysqlimport error.

Reading through the following code[0] (not sure if this is the
relevant version BTW), I see that a logger is set up that should be
giving a lot of information[1] about the mysqlimport calls, but I
don't seem to be getting any of this fun in my logs.

[0] https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
[1] `LOG.debug("Starting mysqlimport with arguments:");`


Additional info:

I have log4j.properties setup in the following basic way:

log4j.rootLogger=${root.logger}
root.logger=INFO,console

log4j.logger.org.apache.hadoop.mapred=TRACE
log4j.logger.org.apache.sqoop.mapreduce=TRACE
log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
%p %c{2}: %m%n

What I have found is that the `max_allowed_packet` setting in MySQL
seems to affect this behaviour somewhat but I'd rather get more
information about the actual error than attempt to tweak a setting
"blind".

Relevant versioning:

Cloudera Hadoop Distribution (5.1.2)
mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
sqoop version: 1.4.4

Kind regards,

Christian Verkerk

-- 
Christian Verkerk
Software Engineer, Tubular Labs
christian@tubularlabs.com

Mime
View raw message