sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Verkerk <christ...@tubularlabs.com>
Subject Re: mysqlimport terminated with error code 1
Date Mon, 15 Sep 2014 18:50:58 GMT
You were right, though the logs link in the ResourceManager would
disappear and I couldn't get to the logs when the container died --
going through a NodeManager that was executing the actual task allowed
me to look at the logs, and it turns out the mysqlimport logs are
printed there:

2014-09-15 17:56:34,025 INFO [main]
org.apache.sqoop.mapreduce.MySQLExportMapper: Checkpointing current
export.
2014-09-15 17:56:34,025 INFO [main]
org.apache.sqoop.mapreduce.MySQLExportMapper: Waiting for mysqlimport
to complete
2014-09-15 17:56:34,032 INFO [Thread-34]
org.apache.sqoop.mapreduce.MySQLExportMapper: mysqlimport: Error:
1205, Lock wait timeout exceeded; try restarting transaction, when
using table: <table>

I set log4j via because /etc/hadoop/conf/log4j.properties

On Mon, Sep 15, 2014 at 11:32 AM, pratik khadloya <tispratik@gmail.com> wrote:
> Btw, Christian, can you tell me how you fed the custom log4j config to
> sqoop?
> I am interested in debugging the MySQLManager class.
>
> Thanks,
> ~Pratik
>
> On Mon, Sep 15, 2014 at 10:39 AM, pratik khadloya <tispratik@gmail.com>
> wrote:
>>
>> You're right they are import only arguments, i misread your original
>> question.
>> Am surprised that there are no logs in the JT. You should be able to see
>> the logs for attempt attempt_1410271365435_0034_m_000000_0
>> and also able to see which machine ran that map job. You can click on the
>> machine name and then on bottom left there is a "Local logs" link which you
>> can click and finally see the local mapper logs for that task tracker.
>>
>> The general url to directly get to those logs is:
>> http://<task-tracker-machine-name>:50060/tasktracker.jsp
>>
>> I suspect the loading command maybe failing due to some column mismatch or
>> some delimiter problems.
>>
>> ~Pratik
>>
>> On Mon, Sep 15, 2014 at 10:18 AM, Christian Verkerk
>> <christian@tubularlabs.com> wrote:
>>>
>>> Hi,
>>>
>>> The jobtracker logs are all empty. The --split-by and --boundary-query
>>> are sqoop import only arguments AFAICT. The split sizes, as in the
>>> size of the file that is loaded into MySQL, is about 32MB.
>>>
>>> The sqoop export job I posted _does_ get data into MySQL, it just
>>> stops after awhile (due to load, presumably) and so running just one
>>> query against MySQL will work just fine and will not reproduce the
>>> error.
>>>
>>> The key is that I need some way to get more information on the exact
>>> error mysqlimport hits.
>>>
>>> Kind regards,
>>>
>>> Christian
>>>
>>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <tispratik@gmail.com>
>>> wrote:
>>> > Is there any reason given for the termination in the jobtracker logs?
>>> > Also, i see that you have not specified any --split-by and/or
>>> > --boundary-query option.
>>> > Does sqoop take time to determine the splits, if yes then specifying
>>> > these
>>> > settings might help.
>>> >
>>> > Also, check what the split sizes are, you may be running into a data
>>> > skew
>>> > depending on the splitting column used (generally the primary key of
>>> > the
>>> > table).
>>> > The query is printed in the sqoop logs, try running the same directly
>>> > on
>>> > mysql and see how mysql responds.
>>> >
>>> > ~Pratik
>>> >
>>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
>>> > <christian@tubularlabs.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I'm trying to run a sqoop export for a large dataset (at least 1B
>>> >> rows) with the following sqoop export call:
>>> >>
>>> >> sqoop export --direct \
>>> >> --connect <host> \
>>> >> --table <table> \
>>> >> --export-dir /user/hive/warehouse/<table> \
>>> >> --num-mappers 8 \
>>> >> --username <username> \
>>> >> --password <password> \
>>> >> --input-fields-terminated-by ',' \
>>> >> --verbose
>>> >>
>>> >> Behind the scenes, I've found that sqoop export does what you'd expect
>>> >> it to: it farms out the work to a (num-mappers) number of different
>>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>>> >> a temp file on each of the nodes and sends it along to mysqlimport
>>> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>>> >> MySQL table.
>>> >>
>>> >> The following error occurs depending on the level of parallelism used
>>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>>> >> definitely will:
>>> >>
>>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>>> >> 17:34:27 INFO mapreduce.Job: Task Id :
>>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>>> >> java.io.IOException: mysqlimport terminated with error code 1 at
>>> >>
>>> >>
>>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>>> >> at
>>> >>
>>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>>> >> java.security.AccessController.doPrivileged(Native Method) at
>>> >> javax.security.auth.Subject.doAs(Subject.java:415) at
>>> >>
>>> >>
>>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>> >>
>>> >> I understand there is some limit to the level of parallelism that can
>>> >> be achieved in the job -- mysqld can get tied up processing too many
>>> >> things at once etc. but I'd like to know how to turn the debugging on
>>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>>> >> can actually see the mysqlimport error.
>>> >>
>>> >> Reading through the following code[0] (not sure if this is the
>>> >> relevant version BTW), I see that a logger is set up that should be
>>> >> giving a lot of information[1] about the mysqlimport calls, but I
>>> >> don't seem to be getting any of this fun in my logs.
>>> >>
>>> >> [0]
>>> >>
>>> >> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>>> >>
>>> >>
>>> >> Additional info:
>>> >>
>>> >> I have log4j.properties setup in the following basic way:
>>> >>
>>> >> log4j.rootLogger=${root.logger}
>>> >> root.logger=INFO,console
>>> >>
>>> >> log4j.logger.org.apache.hadoop.mapred=TRACE
>>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>>> >>
>>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
>>> >> log4j.appender.console.target=System.err
>>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>>> >> %p %c{2}: %m%n
>>> >>
>>> >> What I have found is that the `max_allowed_packet` setting in MySQL
>>> >> seems to affect this behaviour somewhat but I'd rather get more
>>> >> information about the actual error than attempt to tweak a setting
>>> >> "blind".
>>> >>
>>> >> Relevant versioning:
>>> >>
>>> >> Cloudera Hadoop Distribution (5.1.2)
>>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>>> >> sqoop version: 1.4.4
>>> >>
>>> >> Kind regards,
>>> >>
>>> >> Christian Verkerk
>>> >>
>>> >> --
>>> >> Christian Verkerk
>>> >> Software Engineer, Tubular Labs
>>> >> christian@tubularlabs.com
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Christian Verkerk
>>> Software Engineer, Tubular Labs
>>> christian@tubularlabs.com
>>>
>>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <tispratik@gmail.com>
>>> wrote:
>>> > Is there any reason given for the termination in the jobtracker logs?
>>> > Also, i see that you have not specified any --split-by and/or
>>> > --boundary-query option.
>>> > Does sqoop take time to determine the splits, if yes then specifying
>>> > these
>>> > settings might help.
>>> >
>>> > Also, check what the split sizes are, you may be running into a data
>>> > skew
>>> > depending on the splitting column used (generally the primary key of
>>> > the
>>> > table).
>>> > The query is printed in the sqoop logs, try running the same directly
>>> > on
>>> > mysql and see how mysql responds.
>>> >
>>> > ~Pratik
>>> >
>>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
>>> > <christian@tubularlabs.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I'm trying to run a sqoop export for a large dataset (at least 1B
>>> >> rows) with the following sqoop export call:
>>> >>
>>> >> sqoop export --direct \
>>> >> --connect <host> \
>>> >> --table <table> \
>>> >> --export-dir /user/hive/warehouse/<table> \
>>> >> --num-mappers 8 \
>>> >> --username <username> \
>>> >> --password <password> \
>>> >> --input-fields-terminated-by ',' \
>>> >> --verbose
>>> >>
>>> >> Behind the scenes, I've found that sqoop export does what you'd expect
>>> >> it to: it farms out the work to a (num-mappers) number of different
>>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>>> >> a temp file on each of the nodes and sends it along to mysqlimport
>>> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>>> >> MySQL table.
>>> >>
>>> >> The following error occurs depending on the level of parallelism used
>>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>>> >> definitely will:
>>> >>
>>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>>> >> 17:34:27 INFO mapreduce.Job: Task Id :
>>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>>> >> java.io.IOException: mysqlimport terminated with error code 1 at
>>> >>
>>> >>
>>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>>> >> at
>>> >>
>>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>>> >> java.security.AccessController.doPrivileged(Native Method) at
>>> >> javax.security.auth.Subject.doAs(Subject.java:415) at
>>> >>
>>> >>
>>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>> >>
>>> >> I understand there is some limit to the level of parallelism that can
>>> >> be achieved in the job -- mysqld can get tied up processing too many
>>> >> things at once etc. but I'd like to know how to turn the debugging on
>>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>>> >> can actually see the mysqlimport error.
>>> >>
>>> >> Reading through the following code[0] (not sure if this is the
>>> >> relevant version BTW), I see that a logger is set up that should be
>>> >> giving a lot of information[1] about the mysqlimport calls, but I
>>> >> don't seem to be getting any of this fun in my logs.
>>> >>
>>> >> [0]
>>> >>
>>> >> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>>> >>
>>> >>
>>> >> Additional info:
>>> >>
>>> >> I have log4j.properties setup in the following basic way:
>>> >>
>>> >> log4j.rootLogger=${root.logger}
>>> >> root.logger=INFO,console
>>> >>
>>> >> log4j.logger.org.apache.hadoop.mapred=TRACE
>>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>>> >>
>>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
>>> >> log4j.appender.console.target=System.err
>>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>>> >> %p %c{2}: %m%n
>>> >>
>>> >> What I have found is that the `max_allowed_packet` setting in MySQL
>>> >> seems to affect this behaviour somewhat but I'd rather get more
>>> >> information about the actual error than attempt to tweak a setting
>>> >> "blind".
>>> >>
>>> >> Relevant versioning:
>>> >>
>>> >> Cloudera Hadoop Distribution (5.1.2)
>>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>>> >> sqoop version: 1.4.4
>>> >>
>>> >> Kind regards,
>>> >>
>>> >> Christian Verkerk
>>> >>
>>> >> --
>>> >> Christian Verkerk
>>> >> Software Engineer, Tubular Labs
>>> >> christian@tubularlabs.com
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Christian Verkerk
>>> Software Engineer, Tubular Labs
>>> christian@tubularlabs.com
>>
>>
>

Mime
View raw message