sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shuaishuai Nie" <shuai...@microsoft.com>
Subject Re: Review Request 14085: Review request for SQOOP-1192 Add option "--skip-dist-cache" to allow Sqoop not copying jars in %SQOOP_HOME%\lib folder when launched by Oozie and use Oozie share lib
Date Mon, 14 Oct 2013 22:20:45 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Oct. 14, 2013, 10:20 p.m.)

Review request for Sqoop.

Bugs: SQOOP-1192

Repository: sqoop-trunk


Now Sqoop will copy jar files in %SQOOP_HOME%\lib folder to the job cache every time a Sqoop
job is launched. When Oozie launch a Sqoop job, this behavior can be optimized by add these
jars in Oozie Sqoop sharelib. In this case, the jar files in share lib only needed be localized
to each worker node once and reuse by all Sqoop job launched by Oozie. This can reduce massive
disk I/O on worker node when using Sqoop by Oozie. To enable this, Sqoop need to have an option
which enable the job to skip adding lib jars to the job cache. For now, this option should
only be used by Oozie started Sqoop job. The patch attached introduce "--skip-dist-cache"
option to enable this feature.

Diffs (updated)

  src/docs/user/import.txt 71b50d8 
  src/java/org/apache/sqoop/SqoopOptions.java 01805f9 
  src/java/org/apache/sqoop/mapreduce/JobBase.java 322df1c 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java b05f587 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857 
  src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504 

Diff: https://reviews.apache.org/r/14085/diff/


Tested the new option with Oozie-Sqoop workflow to ensure it doesn't break Sqoop library dependencies
when launched by Oozie


Shuaishuai Nie

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message