sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: Classpath isolation in Sqoop2
Date Wed, 16 Sep 2015 16:27:50 GMT
Thanks to stepping up and taking this huge and important effort.

I believe that both suggestion - using shading for client and custom classloader for connector
code is completely reasonable, so +1 from my side.

Jarcec

> On Sep 15, 2015, at 7:30 PM, Fu, Dian <dian.fu@intel.com> wrote:
> 
> Hi all,
> 
> 
> Currently there is no classpath isolation in Sqoop 2. This will cause problems in the
following two cases:
> 
> 1)  if the dependencies of the downstream users of the Sqoop 2 client conflicts with
the dependencies of Sqoop 2 client
> 
> 2)  if the dependencies of third-part connectors conflicts with the dependencies of Sqoop
2 server or conflicts with other third-part connectors
> 
> 
> 
> I'd like to provide classpath isolation in Sqoop 2 and have taken some time to investigate
the status of classpath isolation in Hadoop. Here is a simple summary of the problem and the
solution proposed by Sean in HADOOP-11656<https://issues.apache.org/jira/browse/HADOOP-11656>:
> 
> The problems HADOOP-11656 tries to solve:
> 
>   1) Client side classpath isolation: between Hadoop and its downside applications which
talk directly with HDFS or submit YARN applications.
> 
>  2) Framework level classpath isolation: between YARN server and ApplicationMaster or
between YARN and user application. There is already a solution in Hadoop to solve this issue
which uses webapp-style classloader named ApplicationClassLoader (parent last).
> 
> The solution proposed in HADOOP-11656 by Sean:
> 
>   1) For the client side classpath isolation, Sean proposes to use Maven Shade Plugin
to expose only the public API to clients and use the Maven Shade Plugin relocation capacity
to relocate other dependencies under the package org.apache.hadoop.shaded. (Refer to HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804>
for details)
> 
>   2) For the existing webapp-style classloader solution for framework level classpath
isolation, Sean pointed out it doesn't provide much upgrade help for applications that rely
on the classes found in the fallback case. That is to say, if user code relied on a Hadoop
dependency implicitly and Hadoop upgraded it to an incompatible version, problems will be
caused. Sean proposes to use OSGi container to export different set of dependencies in different
Hadoop versions to solve this issue. (more discussion about this can be found here<https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14540773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14540773>)
> 
> 
> 
> Based on the understanding of the above, for the classpath isolation problem in Sqoop
2, it can be separated into two parts:
> 
> 1)  client side classpath isolation
> 
> 2)  isolation for connectors
> 
> And we have three options to consider:
> 
> 1)  Maven Shade Plugin
> 
> 2)  Webapp-style classloader
> 
> 3)  OSGi
> 
> I'd like to use Maven Shade Plugin to solve the client side classpath isolation problem
in the similar way done in HADOOP-11804<https://issues.apache.org/jira/browse/HADOOP-11804>.
> 
> For the isolation for connectors, Maven shade plugin won't be an option as it isolates
the classpath via relocation capacity at build time and it can't relocate connectors dependencies
at runtime. Between option webapp-style classloader and OSGi, we may need to choose OSGi if
we want to upgrade Sqoop 2 dependencies without affecting third-part connectors in the case
that third-part connectors rely on some Sqoop 2 dependencies implicitly. But if we think that
requiring third-part connectors to upgrade accordingly is acceptable, I would prefer webapp-style
classloader as it is easier to implement compared to OSGi.
> 
> 
> 
> Please feel free to provide your opinions, thanks a lot.
> 
> 
> 
> Regards,
> 
> Dian
> 


Mime
View raw message