sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed El Baz" <ahel...@microsoft.com>
Subject Re: Review Request: Request to review patch for SQOOP-954: Create Sqoop runtime scripts to run Sqoop on Windows
Date Mon, 22 Apr 2013 03:29:08 GMT


> On March 29, 2013, 7:03 p.m., Venkat Ranganathan wrote:
> > Hi Ahmed
> > 
> > Thanks for the new patch.  It looks good.  I still have one issue and suggestion.
 The powershell script to generate the jar file is very good!  You are generating a jar file
everytime and the jar file is generated under SQOOP_HOME.   There may be installations for
the SQOOP_HOME may not be writable by user.   Also, I think the main motivation is to overcome
the environment strings limitation.   Since JDK 1.6, Java has the ability to provide an option
to provide a shortcut for all jars in a file (This probably should be done for the Unix classpaths
also).   Please see http://docs.oracle.com/javase/6/docs/technotes/tools/windows/classpath.html
 
> > 
> > I am thinking whether this should be a simpler change to just add all jars in SQOOP_LIB.
 We have to say %SQOOP_HOME%\lib\*.   Of course, this introduces dependency on 1.6+ versions
of JDK, but given that 1.5 is EOLed this should be OK
> > 
> > Thanks
> 
> Ahmed El Baz wrote:
>     Thank you a lot Venkat for the valuable comments,
>     
>     I have considered the wildcard option, however, there are some limitations why it
was not preferable to go this route, and using the referencing jar would give more flexibility:
>     1) The need to specify particular jars to include, or exclude some jars and not include
all jars by default in a dorectory by using wildcard. For example, in configure-sqoop a list
of dependency jars for HBase are returned by invoking "hbase classpath" which returns a list
of jars. In this case using a wrapper Jar releases us from worrying about the length of jars
returned, and it is not possible to use the * in this case, unless we do some logic to get
common dirs.
>     2) As you can see also in configure-jar, Sqoop has dependency on other components
rather than just SQOOP_HOME\lib, like HBase, SQOOP_CONF, ZOOCFGDIR.
>     3) Using the wrapper jar would scale regardless of how many directories we include.
I understand it is hard the number of folders increases to the limit where we see the long
command error, but even in this case the wrapper jar would work just fine.
>     
>     I would like to unederstand more about scenarios where we anticipate SQOOP_HOME would
not be writable on Windows systems.
>     
>     Thank you again,
>     Ahmed
> 
> Venkat Ranganathan wrote:
>     Thanks Ahmed for the explanation.
>     
>     I thought we are primarily limited by the 8K limit in the command line so if we can
potentially limit the large jar file dirs in this format, then it would be fit within the
limit.
>     Good point of hbase -classpath option.  May be we can have improvement on Hbase to
return the hbase classpath with jar dirs properly added
>     
>     For example, when people install Hadoop on Windows and decide that Hadoop stack will
be installed under a terminal server and this is shared across multiple users - or it  may
be installed in a common location and mapped based on logon scripts.   And the directory can
become inaccessible for people running sqoop jobs.   This is a scheme used by some  Hadoop
distributions today.
>     
>     Thanks
>
> 
> Venkat Ranganathan wrote:
>     I had this comment written befoe, but got caught up in the saved reviews instead
of publishing.  Sorry about that.   Can you check my comments and can we simplify this

Thank you Venkatesh,

I have update the patch to use the jar dirs for classpath locations, rather than the powershell
script to generate a single jar encapsulating the classpath in its manifest. As discussed,
we will need to have a corresponding change for the HBASE case where hbase.cmd -classpath
is invoked to return a list of jar files. For now we use HBASE_HOME and HBASE_HOME\lib in
the case of Windows.

Thanks,
Ahmed


- Ahmed


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10055/#review18523
-----------------------------------------------------------


On April 22, 2013, 3:26 a.m., Ahmed El Baz wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10055/
> -----------------------------------------------------------
> 
> (Updated April 22, 2013, 3:26 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> A patch implementing the Windows version of Sqoop run scripts. The scripts follow the
same logic as there .sh counterparts.
> One difference is to create a Jar which references all classpath elements in its Manifest,
and provide that jar as the single jar needed for Sqoop. The reason here is that in some cases
if the number of classpath elements is large, HADOOP_CLASSPATH gets very long which causes
failures in Windows since there is a limit to command lines.
> As a workaround, I added a step to wrap all jars in the classpath in a single jar, and
then use that generated jar (this is also done in hadoop for Windows to handle similar issues)
> I did this in a utility script "BuildJar" which can be used for other components as well.
> This change is specific to Windows scripts, Linux scripts are not affected.
> 
> 
> This addresses bug SQOOP-954.
>     https://issues.apache.org/jira/browse/SQOOP-954
> 
> 
> Diffs
> -----
> 
>   bin/configure-sqoop.cmd PRE-CREATION 
>   bin/sqoop.cmd PRE-CREATION 
>   conf/sqoop-env-template.cmd PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10055/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Ahmed El Baz
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message