sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pratik khadloya <tispra...@gmail.com>
Subject Re: mysqlimport terminated with error code 1
Date Mon, 15 Sep 2014 18:32:42 GMT
Btw, Christian, can you tell me how you fed the custom log4j config to
sqoop?
I am interested in debugging the MySQLManager class.

Thanks,
~Pratik

On Mon, Sep 15, 2014 at 10:39 AM, pratik khadloya <tispratik@gmail.com>
wrote:

> You're right they are import only arguments, i misread your original
> question.
> Am surprised that there are no logs in the JT. You should be able to see
> the logs for attempt attempt_1410271365435_0034_m_000000_0
> and also able to see which machine ran that map job. You can click on the
> machine name and then on bottom left there is a "Local logs" link which you
> can click and finally see the local mapper logs for that task tracker.
>
> The general url to directly get to those logs is:
> http://<task-tracker-machine-name>:50060/tasktracker.jsp
>
> I suspect the loading command maybe failing due to some column mismatch or
> some delimiter problems.
>
> ~Pratik
>
> On Mon, Sep 15, 2014 at 10:18 AM, Christian Verkerk <
> christian@tubularlabs.com> wrote:
>
>> Hi,
>>
>> The jobtracker logs are all empty. The --split-by and --boundary-query
>> are sqoop import only arguments AFAICT. The split sizes, as in the
>> size of the file that is loaded into MySQL, is about 32MB.
>>
>> The sqoop export job I posted _does_ get data into MySQL, it just
>> stops after awhile (due to load, presumably) and so running just one
>> query against MySQL will work just fine and will not reproduce the
>> error.
>>
>> The key is that I need some way to get more information on the exact
>> error mysqlimport hits.
>>
>> Kind regards,
>>
>> Christian
>>
>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <tispratik@gmail.com>
>> wrote:
>> > Is there any reason given for the termination in the jobtracker logs?
>> > Also, i see that you have not specified any --split-by and/or
>> > --boundary-query option.
>> > Does sqoop take time to determine the splits, if yes then specifying
>> these
>> > settings might help.
>> >
>> > Also, check what the split sizes are, you may be running into a data
>> skew
>> > depending on the splitting column used (generally the primary key of the
>> > table).
>> > The query is printed in the sqoop logs, try running the same directly on
>> > mysql and see how mysql responds.
>> >
>> > ~Pratik
>> >
>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
>> > <christian@tubularlabs.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm trying to run a sqoop export for a large dataset (at least 1B
>> >> rows) with the following sqoop export call:
>> >>
>> >> sqoop export --direct \
>> >> --connect <host> \
>> >> --table <table> \
>> >> --export-dir /user/hive/warehouse/<table> \
>> >> --num-mappers 8 \
>> >> --username <username> \
>> >> --password <password> \
>> >> --input-fields-terminated-by ',' \
>> >> --verbose
>> >>
>> >> Behind the scenes, I've found that sqoop export does what you'd expect
>> >> it to: it farms out the work to a (num-mappers) number of different
>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>> >> a temp file on each of the nodes and sends it along to mysqlimport
>> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>> >> MySQL table.
>> >>
>> >> The following error occurs depending on the level of parallelism used
>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>> >> definitely will:
>> >>
>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>> >> 17:34:27 INFO mapreduce.Job: Task Id :
>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>> >> java.io.IOException: mysqlimport terminated with error code 1 at
>> >>
>> >>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>> >> at
>> >>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>> >> java.security.AccessController.doPrivileged(Native Method) at
>> >> javax.security.auth.Subject.doAs(Subject.java:415) at
>> >>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>> >>
>> >> I understand there is some limit to the level of parallelism that can
>> >> be achieved in the job -- mysqld can get tied up processing too many
>> >> things at once etc. but I'd like to know how to turn the debugging on
>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>> >> can actually see the mysqlimport error.
>> >>
>> >> Reading through the following code[0] (not sure if this is the
>> >> relevant version BTW), I see that a logger is set up that should be
>> >> giving a lot of information[1] about the mysqlimport calls, but I
>> >> don't seem to be getting any of this fun in my logs.
>> >>
>> >> [0]
>> >>
>> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>> >>
>> >>
>> >> Additional info:
>> >>
>> >> I have log4j.properties setup in the following basic way:
>> >>
>> >> log4j.rootLogger=${root.logger}
>> >> root.logger=INFO,console
>> >>
>> >> log4j.logger.org.apache.hadoop.mapred=TRACE
>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>> >>
>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> >> log4j.appender.console.target=System.err
>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>> >> %p %c{2}: %m%n
>> >>
>> >> What I have found is that the `max_allowed_packet` setting in MySQL
>> >> seems to affect this behaviour somewhat but I'd rather get more
>> >> information about the actual error than attempt to tweak a setting
>> >> "blind".
>> >>
>> >> Relevant versioning:
>> >>
>> >> Cloudera Hadoop Distribution (5.1.2)
>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>> >> sqoop version: 1.4.4
>> >>
>> >> Kind regards,
>> >>
>> >> Christian Verkerk
>> >>
>> >> --
>> >> Christian Verkerk
>> >> Software Engineer, Tubular Labs
>> >> christian@tubularlabs.com
>> >
>> >
>>
>>
>>
>> --
>> Christian Verkerk
>> Software Engineer, Tubular Labs
>> christian@tubularlabs.com
>>
>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <tispratik@gmail.com>
>> wrote:
>> > Is there any reason given for the termination in the jobtracker logs?
>> > Also, i see that you have not specified any --split-by and/or
>> > --boundary-query option.
>> > Does sqoop take time to determine the splits, if yes then specifying
>> these
>> > settings might help.
>> >
>> > Also, check what the split sizes are, you may be running into a data
>> skew
>> > depending on the splitting column used (generally the primary key of the
>> > table).
>> > The query is printed in the sqoop logs, try running the same directly on
>> > mysql and see how mysql responds.
>> >
>> > ~Pratik
>> >
>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
>> > <christian@tubularlabs.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm trying to run a sqoop export for a large dataset (at least 1B
>> >> rows) with the following sqoop export call:
>> >>
>> >> sqoop export --direct \
>> >> --connect <host> \
>> >> --table <table> \
>> >> --export-dir /user/hive/warehouse/<table> \
>> >> --num-mappers 8 \
>> >> --username <username> \
>> >> --password <password> \
>> >> --input-fields-terminated-by ',' \
>> >> --verbose
>> >>
>> >> Behind the scenes, I've found that sqoop export does what you'd expect
>> >> it to: it farms out the work to a (num-mappers) number of different
>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>> >> a temp file on each of the nodes and sends it along to mysqlimport
>> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>> >> MySQL table.
>> >>
>> >> The following error occurs depending on the level of parallelism used
>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>> >> definitely will:
>> >>
>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>> >> 17:34:27 INFO mapreduce.Job: Task Id :
>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>> >> java.io.IOException: mysqlimport terminated with error code 1 at
>> >>
>> >>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>> >> at
>> >>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>> >> java.security.AccessController.doPrivileged(Native Method) at
>> >> javax.security.auth.Subject.doAs(Subject.java:415) at
>> >>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>> >>
>> >> I understand there is some limit to the level of parallelism that can
>> >> be achieved in the job -- mysqld can get tied up processing too many
>> >> things at once etc. but I'd like to know how to turn the debugging on
>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>> >> can actually see the mysqlimport error.
>> >>
>> >> Reading through the following code[0] (not sure if this is the
>> >> relevant version BTW), I see that a logger is set up that should be
>> >> giving a lot of information[1] about the mysqlimport calls, but I
>> >> don't seem to be getting any of this fun in my logs.
>> >>
>> >> [0]
>> >>
>> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>> >>
>> >>
>> >> Additional info:
>> >>
>> >> I have log4j.properties setup in the following basic way:
>> >>
>> >> log4j.rootLogger=${root.logger}
>> >> root.logger=INFO,console
>> >>
>> >> log4j.logger.org.apache.hadoop.mapred=TRACE
>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>> >>
>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> >> log4j.appender.console.target=System.err
>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>> >> %p %c{2}: %m%n
>> >>
>> >> What I have found is that the `max_allowed_packet` setting in MySQL
>> >> seems to affect this behaviour somewhat but I'd rather get more
>> >> information about the actual error than attempt to tweak a setting
>> >> "blind".
>> >>
>> >> Relevant versioning:
>> >>
>> >> Cloudera Hadoop Distribution (5.1.2)
>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>> >> sqoop version: 1.4.4
>> >>
>> >> Kind regards,
>> >>
>> >> Christian Verkerk
>> >>
>> >> --
>> >> Christian Verkerk
>> >> Software Engineer, Tubular Labs
>> >> christian@tubularlabs.com
>> >
>> >
>>
>>
>>
>> --
>> Christian Verkerk
>> Software Engineer, Tubular Labs
>> christian@tubularlabs.com
>>
>
>

Mime
View raw message