sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Verkerk <christ...@tubularlabs.com>
Subject Re: mysqlimport terminated with error code 1
Date Mon, 15 Sep 2014 17:18:32 GMT
Hi,

The jobtracker logs are all empty. The --split-by and --boundary-query
are sqoop import only arguments AFAICT. The split sizes, as in the
size of the file that is loaded into MySQL, is about 32MB.

The sqoop export job I posted _does_ get data into MySQL, it just
stops after awhile (due to load, presumably) and so running just one
query against MySQL will work just fine and will not reproduce the
error.

The key is that I need some way to get more information on the exact
error mysqlimport hits.

Kind regards,

Christian

On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <tispratik@gmail.com> wrote:
> Is there any reason given for the termination in the jobtracker logs?
> Also, i see that you have not specified any --split-by and/or
> --boundary-query option.
> Does sqoop take time to determine the splits, if yes then specifying these
> settings might help.
>
> Also, check what the split sizes are, you may be running into a data skew
> depending on the splitting column used (generally the primary key of the
> table).
> The query is printed in the sqoop logs, try running the same directly on
> mysql and see how mysql responds.
>
> ~Pratik
>
> On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
> <christian@tubularlabs.com> wrote:
>>
>> Hi,
>>
>> I'm trying to run a sqoop export for a large dataset (at least 1B
>> rows) with the following sqoop export call:
>>
>> sqoop export --direct \
>> --connect <host> \
>> --table <table> \
>> --export-dir /user/hive/warehouse/<table> \
>> --num-mappers 8 \
>> --username <username> \
>> --password <password> \
>> --input-fields-terminated-by ',' \
>> --verbose
>>
>> Behind the scenes, I've found that sqoop export does what you'd expect
>> it to: it farms out the work to a (num-mappers) number of different
>> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>> a temp file on each of the nodes and sends it along to mysqlimport
>> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>> MySQL table.
>>
>> The following error occurs depending on the level of parallelism used
>> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>> definitely will:
>>
>> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>> 17:34:27 INFO mapreduce.Job: Task Id :
>> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>> java.io.IOException: mysqlimport terminated with error code 1 at
>>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>> at
>> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>
>> I understand there is some limit to the level of parallelism that can
>> be achieved in the job -- mysqld can get tied up processing too many
>> things at once etc. but I'd like to know how to turn the debugging on
>> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>> can actually see the mysqlimport error.
>>
>> Reading through the following code[0] (not sure if this is the
>> relevant version BTW), I see that a logger is set up that should be
>> giving a lot of information[1] about the mysqlimport calls, but I
>> don't seem to be getting any of this fun in my logs.
>>
>> [0]
>> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>>
>>
>> Additional info:
>>
>> I have log4j.properties setup in the following basic way:
>>
>> log4j.rootLogger=${root.logger}
>> root.logger=INFO,console
>>
>> log4j.logger.org.apache.hadoop.mapred=TRACE
>> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>>
>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> log4j.appender.console.target=System.err
>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>> %p %c{2}: %m%n
>>
>> What I have found is that the `max_allowed_packet` setting in MySQL
>> seems to affect this behaviour somewhat but I'd rather get more
>> information about the actual error than attempt to tweak a setting
>> "blind".
>>
>> Relevant versioning:
>>
>> Cloudera Hadoop Distribution (5.1.2)
>> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>> sqoop version: 1.4.4
>>
>> Kind regards,
>>
>> Christian Verkerk
>>
>> --
>> Christian Verkerk
>> Software Engineer, Tubular Labs
>> christian@tubularlabs.com
>
>



-- 
Christian Verkerk
Software Engineer, Tubular Labs
christian@tubularlabs.com

On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <tispratik@gmail.com> wrote:
> Is there any reason given for the termination in the jobtracker logs?
> Also, i see that you have not specified any --split-by and/or
> --boundary-query option.
> Does sqoop take time to determine the splits, if yes then specifying these
> settings might help.
>
> Also, check what the split sizes are, you may be running into a data skew
> depending on the splitting column used (generally the primary key of the
> table).
> The query is printed in the sqoop logs, try running the same directly on
> mysql and see how mysql responds.
>
> ~Pratik
>
> On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
> <christian@tubularlabs.com> wrote:
>>
>> Hi,
>>
>> I'm trying to run a sqoop export for a large dataset (at least 1B
>> rows) with the following sqoop export call:
>>
>> sqoop export --direct \
>> --connect <host> \
>> --table <table> \
>> --export-dir /user/hive/warehouse/<table> \
>> --num-mappers 8 \
>> --username <username> \
>> --password <password> \
>> --input-fields-terminated-by ',' \
>> --verbose
>>
>> Behind the scenes, I've found that sqoop export does what you'd expect
>> it to: it farms out the work to a (num-mappers) number of different
>> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
>> a temp file on each of the nodes and sends it along to mysqlimport
>> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
>> MySQL table.
>>
>> The following error occurs depending on the level of parallelism used
>> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
>> definitely will:
>>
>> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
>> 17:34:27 INFO mapreduce.Job: Task Id :
>> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
>> java.io.IOException: mysqlimport terminated with error code 1 at
>>
>> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
>> at
>> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>
>> I understand there is some limit to the level of parallelism that can
>> be achieved in the job -- mysqld can get tied up processing too many
>> things at once etc. but I'd like to know how to turn the debugging on
>> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
>> can actually see the mysqlimport error.
>>
>> Reading through the following code[0] (not sure if this is the
>> relevant version BTW), I see that a logger is set up that should be
>> giving a lot of information[1] about the mysqlimport calls, but I
>> don't seem to be getting any of this fun in my logs.
>>
>> [0]
>> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
>> [1] `LOG.debug("Starting mysqlimport with arguments:");`
>>
>>
>> Additional info:
>>
>> I have log4j.properties setup in the following basic way:
>>
>> log4j.rootLogger=${root.logger}
>> root.logger=INFO,console
>>
>> log4j.logger.org.apache.hadoop.mapred=TRACE
>> log4j.logger.org.apache.sqoop.mapreduce=TRACE
>> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
>>
>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> log4j.appender.console.target=System.err
>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>> %p %c{2}: %m%n
>>
>> What I have found is that the `max_allowed_packet` setting in MySQL
>> seems to affect this behaviour somewhat but I'd rather get more
>> information about the actual error than attempt to tweak a setting
>> "blind".
>>
>> Relevant versioning:
>>
>> Cloudera Hadoop Distribution (5.1.2)
>> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
>> sqoop version: 1.4.4
>>
>> Kind regards,
>>
>> Christian Verkerk
>>
>> --
>> Christian Verkerk
>> Software Engineer, Tubular Labs
>> christian@tubularlabs.com
>
>



-- 
Christian Verkerk
Software Engineer, Tubular Labs
christian@tubularlabs.com

Mime
View raw message