sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Araujo <ara...@pythian.com>
Subject Re: Intermittent problems with sqoop using Oracle JDBC driver
Date Mon, 15 Jul 2013 12:34:56 GMT
Thanks, David. I'll give it a go.

BTW, here's the link of the published solution:
http://www.pythian.com/blog/connection-resets-when-importing-from-oracle-with-sqoop/


On 15 July 2013 13:31, David Robson <David.Robson@software.dell.com> wrote:

>  I think I worked out why my emails weren’t getting through to the
> mailing list – my email has changed from @quest.com to @software.dell.com.
> ****
>
> ** **
>
> Have you tried OraOop? It would be interesting to know if it works out of
> the box – as I have hopefully fixed it so this wouldn’t occur. It may still
> not get set for the initial connection – as some Sqoop code was calling the
> security manager before I could get a chance to set the option – but it
> should print a warning if that happens so you know to update your config.
> It will also fix a lot of throughput issues that you would get with the
> standard connector for large data volumes / large number of mappers.****
>
> ** **
>
> *David Robson*****
>
> Software Developer****
>
> *Dell** *| R&D, Quest Software****
>
> *office *+61 3 9811 8082****
>
> ** **
>
> Quest Software is now part of Dell****
>
> ** **
>
> *From:* Andre Araujo [mailto:araujo@pythian.com]
> *Sent:* Monday, 15 July 2013 1:12 PM
> *To:* David Robson
> *Cc:* user@sqoop.apache.org
>
> *Subject:* Re: Intermittent problems with sqoop using Oracle JDBC driver**
> **
>
>  ** **
>
> Thanks, David.****
>
> ** **
>
> My blog post is pending revision and should be published soon. I'll post
> the final link when it does.****
>
> For the time being, please see below a copy of it without the formatting.
> What worked for me was a combination of two things:****
>
>    - passing the "-D
>    mapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom""
>    parameter to sqoop****
>    - setting the java.security.egd parameter in the HADOOP_OPTS variable,
>    so that it was passed to "${HADOOP_COMMON_HOME}/bin/hadoop
>     org.apache.sqoop.Sqoop"****
>
>  Regards,****
>
> Andre****
>
> ** **
>
> -----------------------------------****
>
> ** **
>
> I’ve been using Sqoop to load data into HDFS from Oracle. I’m using
> version 1.4.3 of Sqoop, running on a Linux machine and using the Oracle
> JDBC driver with JDK 1.6.****
>
> ** **
>
> I was getting intermittent connection resets when trying to import data.
> After much troubleshooting, I eventually found the problem to be related to
> a known issue with the JDBC driver and found a way to work around it, which
> is described in the post****
>
> ** **
>
> ** **
>
> The problem****
>
> ** **
>
> I noticed that when I was importing data at times where the machine I was
> running the sqoop client at was mostly idle, everything would run just
> fine. However, at times when others started to work on the same machine and
> it became a bit busier, I would start to get the errors below
> intermittently:****
>
> ** **
>
> [araujo@client ~]$ sqoop import --connect jdbc:oracle:thin:user/pwd@host/orcl
> -m 1 --query 'select 1 from dual where $CONDITIONS' --target-dir test****
>
> 13/07/12 09:35:39 INFO manager.SqlManager: Using default fetchSize of 1000
> ****
>
> 13/07/12 09:35:39 INFO tool.CodeGenTool: Beginning code generation****
>
> 13/07/12 09:37:53 ERROR manager.SqlManager: Error executing statement:
> java.sql.SQLRecoverableException: IO Error: Connection reset****
>
>             at
> oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:467)****
>
>             at
> oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:546)****
>
>         ...****
>
> Caused by: java.net.SocketException: Connection reset****
>
>             at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)****
>
>             at
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)****
>
>             ... 24 more****
>
> After some troubleshooting and googling, I found that the problem seemed
> to be related to the issue described in the following articles:****
>
> ** **
>
>
> http://stackoverflow.com/questions/2327220/oracle-jdbc-intermittent-connection-issue/
> ****
>
> https://forums.oracle.com/message/3701989/****
>
> ** **
>
> Confirming the problem****
>
> ** **
>
> To ensure the problem was the same as the one described in the articles,
> and not something else intrinsic to Sqoop, I created a small Java program
> that simply connected to the database. I was able to reproduce the issue
> using it:****
>
> ** **
>
> [araujo@client TestConn]$ time java TestConn****
>
> Exception in thread "main" java.sql.SQLRecoverableException: IO Error:
> Connection reset****
>
> ...****
>
> Caused by: java.net.SocketException: Connection reset****
>
> ...****
>
>             ... 8 more****
>
> ** **
>
> real      1m20.481s****
>
> user      0m0.491s****
>
> sys       0m0.051s****
>
> The workaround suggested in the articles also worked:****
>
> ** **
>
> [araujo@client TestConn]$ time java
> -Djava.security.egd=file:/dev/../dev/urandom TestConn****
>
> Connection successful!****
>
> ** **
>
> real      0m0.419s****
>
> user      0m0.498s****
>
> sys       0m0.036s****
>
> Applying the fix to Sqoop****
>
> ** **
>
> It took me a while to figure out how to use the workaround above with
> Sqoop. Many tentatives to specify the parameter in the Sqoop command line,
> in many different forms, didn’t work as expected.****
>
> ** **
>
> The articles mention that the java.security.egd parameter can be centrally
> set in the $JAVA_HOME/jre/lib/security/java.security file. Unfortunately,
> this didn’t work for me. Using strace, I confirmed that Sqoop was actually
> reading the java.security file but the setting just didn’t take effect. I
> couldn’t figure out why not and eventually gave up that alternative.****
>
> ** **
>
> After a bit more of stracing and troubleshooting, though, I finally
> figured a way out.****
>
> ** **
>
> Sqoop seems to use the JDBC driver in two different ways:****
>
> ** **
>
> First, it connects to the Oracle database directly. It does that to gather
> more information about the tables (or query) from where the data is
> extracted and generate the map reduce job that it will run.****
>
> Second, the map reduce job generated by Sqoop uses the JDBC driver to
> connect to the database and perform the actual data import.****
>
> I was hitting the problem in the first case above, but I believe in both
> cases there’s a potential for the problem to occur. So, ideally, we should
> apply the workaround to both cases.****
>
> ** **
>
> The Sqoop documentation clearly gives us an option to address the second
> case: using the following parameter to Sqoop allows us to pass Java command
> line options to the map reduce job:****
>
> ** **
>
> sqoop import -D
> mapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom" ...*
> ***
>
> Even though I couldn’t fully prove the above, since I couldn’t
> consistently reproduce the problem for the map reduce tasks, I believe (and
> hope) it should work well.****
>
> ** **
>
> The Sqoop direct connection to Oracle****
>
> ** **
>
> The problem with the direct connection from Sqoop to Oracle, though,
> wasn’t resolved by that option. Trying to pass the
> “-Djava.security.egd=file:/dev/../dev/urandom” option directly to Sqoop
> didn’t work either.****
>
> ** **
>
> After digging up a bit I found that the sqoop command eventually calls
> ${HADOOP_COMMON_HOME}/bin/hadoop to execute the org.apache.sqoop.Sqoop
> class. Since the hadoop executable is used, it accepts Java command line
> options through the HADOOP_OPTS environment variable.****
>
> ** **
>
> A quick test confirmed that the case was closed:****
>
> ** **
>
> [araujo@client STimport]$ export
> HADOOP_OPTS=-Djava.security.egd=file:/dev/../dev/urandom****
>
> [araujo@client STimport]$ sqoop import -D
> mapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom"
> --connect jdbc:oracle:thin:user/pwd@host/orcl -m 1 --query 'select 1 from
> dual where $CONDITIONS' --target-dir test ****
>
> 13/07/12 10:08:17 INFO manager.SqlManager: Using default fetchSize of 1000
> ****
>
> 13/07/12 10:08:17 INFO tool.CodeGenTool: Beginning code generation****
>
> 13/07/12 10:08:18 INFO manager.OracleManager: Time zone has been set to GMT
> ****
>
> 13/07/12 10:08:18 INFO manager.SqlManager: Executing SQL statement: select
> 1 from dual where  (1 = 0) ****
>
> 13/07/12 10:08:18 INFO manager.SqlManager: Executing SQL statement: select
> 1 from dual where  (1 = 0) ****
>
> 13/07/12 10:08:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
> /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/bin/../lib/hadoop-0.20-mapreduce
> ****
>
> 13/07/12 10:08:18 INFO orm.CompilationManager: Found hadoop core jar at:
> /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/bin/../lib/hadoop-0.20-mapreduce/hadoop-core.jar
> ****
>
> Note:
> /tmp/sqoop-araujo/compile/02ed1ccf04debf4769910b93ca67d2ba/QueryResult.java
> uses or overrides a deprecated API.****
>
> Note: Recompile with -Xlint:deprecation for details.****
>
> 13/07/12 10:08:19 INFO orm.CompilationManager: Writing jar file:
> /tmp/sqoop-araujo/compile/02ed1ccf04debf4769910b93ca67d2ba/QueryResult.jar
> ****
>
> 13/07/12 10:08:19 INFO mapreduce.ImportJobBase: Beginning query import.***
> *
>
> 13/07/12 10:08:19 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.***
> *
>
> 13/07/12 10:08:21 INFO mapred.JobClient: Running job: job_201306141710_0075
> ****
>
> 13/07/12 10:08:22 INFO mapred.JobClient:  map 0% reduce 0%****
>
> 13/07/12 10:08:31 INFO mapred.JobClient:  map 100% reduce 0%****
>
> 13/07/12 10:08:33 INFO mapred.JobClient: Job complete:
> job_201306141710_0075****
>
> 13/07/12 10:08:33 INFO mapred.JobClient: Counters: 23****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:   File System Counters****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of bytes read=0*
> ***
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of bytes
> written=179438****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of read
> operations=0****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of large read
> operations=0****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     FILE: Number of write
> operations=0****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of bytes read=87
> ****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of bytes
> written=2****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of read
> operations=1****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of large read
> operations=0****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     HDFS: Number of write
> operations=1****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:   Job Counters ****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Launched map tasks=1****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Total time spent by all maps
> in occupied slots (ms)=7182****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Total time spent by all
> reduces in occupied slots (ms)=0****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:   Map-Reduce Framework****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Map input records=1****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Map output records=1****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Input split bytes=87****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Spilled Records=0****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     CPU time spent (ms)=940****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=236580864****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=34998603776****
>
> 13/07/12 10:08:33 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=1013252096****
>
> 13/07/12 10:08:33 INFO mapreduce.ImportJobBase: Transferred 2 bytes in
> 14.4198 seconds (0.1387 bytes/sec)****
>
> 13/07/12 10:08:33 INFO mapreduce.ImportJobBase: Retrieved 1 records.****
>
> - See more at:
> http://www.pythian.com/blog/?p=56405&preview=true&preview_id=56405&preview_nonce=7dfd5e1733#sthash.mUjOwAdP.dpuf
> ****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> On 15 July 2013 12:06, David Robson <David.Robson@software.dell.com>
> wrote:****
>
> Hi Andre,****
>
>  ****
>
> I sent this reply earlier – but I don’t think it went through to the list
> – so sorry if this ends up being a repost (I also BCCd you).****
>
>  ****
>
> I had a similar issue – you can see more details here:
> https://questmos.jira.com/browse/ORAOOP-2****
>
>  ****
>
> Basically if you use OraOop 1.6 – it should set this for you
> automatically. The only problem is if someone has set the parameter as
> final on the server – then obviously the job can’t override it.****
>
>  ****
>
> I think the reason it’s not working for you is you need to set it as
> follows:****
>
>  ****
>
> -Dmapred.child.java.opts="-Djava.security.egd=file:///dev/urandom"****
>
>  ****
>
> You could try that – or download OraOop from here:
> http://www.toadworld.com/m/freeware/565.aspx****
>
>  ****
>
> It is Apache 2.0 licensed and the source code is available
> https://github.com/QuestSoftwareTCD/OracleSQOOPconnector****
>
>  ****
>
> *David Robson*****
>
> Software Developer****
>
> *Dell** *| R&D, Quest Software****
>
> *office *+61 3 9811 8082****
>
>  ****
>
> Quest Software is now part of Dell****
>
>  ****
>
> *From:* Andre Araujo [mailto:araujo@pythian.com]
> *Sent:* Monday, 15 July 2013 10:30 AM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Intermittent problems with sqoop using Oracle JDBC driver**
> **
>
>  ****
>
> Hi, Jarek,****
>
>  ****
>
> I'll fix that and reply to this email when the link is working.****
>
>  ****
>
> Thanks for pointing that out.****
>
>  ****
>
> Andre****
>
>  ****
>
> On 13 July 2013 01:35, Jarek Jarcec Cecho <jarcec@apache.org> wrote:****
>
> Hi Andre,
> thank you for sharing the solution with us! The link do not seem to be
> valid, would you mind checking if it's correct?
>
> Jarcec****
>
>
> On Sat, Jul 13, 2013 at 01:05:19AM +1000, Andre Araujo wrote:
> > Hi, all,
> >
> > I've found the answer for my own question.
> > The solution is explained here:
> > http://www.pythian.com/blog/?p=56405&preview=true
> >
> > Regards,
> > Andre
> >
> >
> >
> > On 12 July 2013 15:57, Andre Araujo <araujo@pythian.com> wrote:
> >
> > > Hi,
> > >
> > > While using sqoop to import data from an Oracle database I was noticing
> > > intermittent problems with sqoop hanging. After some troubleshooting I
> > > narrowed the problem down to the issue described in the following
> pages:
> > >
> > >
> > >
> http://stackoverflow.com/questions/2327220/oracle-jdbc-intermittent-connect=
> > > ion-issue
> > > https://forums.oracle.com/message/3701989
> > >
> > > The problem seems to be directly related to the Oracle JDBC driver. As
> per
> > > the articles above, I can tell the driver to use /dev/urandom by using
> the
> > > option -Djava.security.egd. I tested it with a standalone Java program
> > > connecting to the database using the JDBC driver and it works well.
> > >
> > > sqoop, though, uses JDBC in two ways. It connects directly to the
> database
> > > thru JDBC to gather details about the table/query and generate the
> mapred
> > > job. It also uses the driver indirectly in the mapred tasks.
> > >
> > > We can use the -D mapred.child.java.opts=3D"-Djava.security.egd=3D..."
> > > opti=
> > > on
> > > of sqoop to tell the mapred tasks to use the option. However, I haven't
> > > found any way to tell sqoop to use it while connecting directly to the
> > > database. Because of this, it hangs soon after I start it and even
> before
> > > it kicks off the mapred job.
> > >
> > > I also tried to pass the java.security.egd option directly to sqoop
> with
> > > the following command line but it didn't work:
> > >
> > > -D java.security.egd=3D/dev/urandom -D mapred.child.java.opts=3D"-D
> > > java.security.egd=3D/dev/urandom"
> > >
> > > I added the option to the java.security file, but for some reason it
> also
> > > didn't sort effect for sqoop.
> > >
> > > Any idea on a workaround/solution for this?
> > >
> > > Thanks,
> > > Andre
> > >
> > > --
> > > André Araújo
> > > Database Administrator / SDM
> > > The Pythian Group - Australia - www.pythian.com
> > >
> > > Office (calls from within Australia): 1300 366 021 x1270****
>
> > > Office (international): +61 2 9191 7427 x270<%2B61%202%209191%207427%20%20x270>*OR*
+1
> 613 565 8696 x1270 <%2B1%20613%20565%208696%20%20%20x1270>****
>
> > > Mobile: +61 410 323 559
> > > Fax: +61 2 9805 0544
> > > IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
> > >
> > >
> >
> >
> > --
> > André Araújo
> > Database Administrator / SDM
> > The Pythian Group - Australia - www.pythian.com
> >
> > Office (calls from within Australia): 1300 366 021 x1270****
>
> > Office (international): +61 2 9191 7427 x270<%2B61%202%209191%207427%20%20x270>*OR*
+1
> 613 565 8696 x1270 <%2B1%20613%20565%208696%20%20%20x1270>****
>
> > Mobile: +61 410 323 559
> > Fax: +61 2 9805 0544
> > IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk
> >
> > --
> >
> >****
>
> > --
> >
> >
> >****
>
>
>
> ****
>
>  ****
>
> -- ****
>
> André Araújo
> Database Administrator / SDM
> The Pythian Group - Australia - www.pythian.com****
>
> Office (calls from within Australia): 1300 366 021 x1270
> Office (international): +61 2 9191 7427  x270 *OR* +1 613 565 8696   x1270
> Mobile: +61 410 323 559
> Fax: +61 2 9805 0544
> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk****
>
>  ****
>
>  ****
>
> --****
>
>  ****
>
>
>
> ****
>
> ** **
>
> -- ****
>
> André Araújo
> Database Administrator / SDM
> The Pythian Group - Australia - www.pythian.com****
>
> Office (calls from within Australia): 1300 366 021 x1270
> Office (international): +61 2 9191 7427  x270 *OR* +1 613 565 8696   x1270
> Mobile: +61 410 323 559
> Fax: +61 2 9805 0544
> IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk****
>
> ** **
>
> ** **
>
> --****
>
> ** **
>



-- 
André Araújo
Database Administrator / SDM
The Pythian Group - Australia - www.pythian.com

Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 9191 7427  x270 *OR* +1 613 565 8696   x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com @ GTalk

-- 


--




Mime
View raw message