hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10893) isolated classloader on the client side
Date Thu, 14 Aug 2014 00:10:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096331#comment-14096331

Sangjin Lee commented on HADOOP-10893:

Thanks for the review Jason! It's helpful as always.

bq. HADOOP_USE_CLIENT_CLASSLOADER would be a better name to keep within the HADOOP_* shell
variable namespace and is consistent with HADOOP_USER_CLASSPATH_FIRST. Similarly HADOOP_CLIENT_SYSTEM_CLASSES.

Sounds good. I'll make the change.

bq. ApplicationClassLoader was marked Public, so I'm wondering if we should leave a deprecated,
trivial derivation of the new class location just in case someone referenced it?

I did not notice that it was marked public. I'll recreate a deprecated extending class in
its current location.

bq. What was the rationale behind the Splitter change which seems unrelated?

If possible, I wanted to avoid having a dependency from this classloader class to another
library unless it's really necessary. Splitter was coming from guava. :) In theory it should
be OK even if ApplicationClassLoader used a guava class. It would be loaded by the system
classloader anyway, and it would not interfere with the ApplicationClassLoader's ability to
load a new version of the class for the user.

However, it was more of a call to minimize the external dependency from ApplicationClassLoader.
I believe the current version (using String.split()) is equivalent and using the Splitter
is not needed, but I'd be open to reversing it.

Would be nice if we could somehow tie the default system classes defined in RunJar with the
default for the job classloader so we don't have to remember to change it in two places going
forward. Unfortunately the job classloader one is encoded in mapred-default.xml, so I don't
know of a good way to do this offhand. Any ideas?

I struggled with that decision a bit. As you mentioned, if you want to override the defaults,
you'd need to do it in two places if you use it for the client and for the tasks as well (and
for the vast majority of the cases I would imagine that is the case).

At least I feel that it would be better if at least the default is in one place. In that sense,
how about having the default in ApplicationClassLoader itself? You still need to override
it in two places, but it feels like an improvement over the current version.

The doc comments in hadoop-config.sh should mention the client system classes variable, how
to use it, and potentially even its default value. I know, I know. Yet another place to update
if it changes, but users will likely have easy access to the config comment and not the java/javadoc
for RunJar. Or maybe the default should already be in hadoop-config.sh with a hardcoded, last-resort
fallback in RunJar if not set in hadoop-config.sh? Anyway we should at least mention the ability
to specify the system classes.

I agree it would be good to document the usage of the system classes env variable. I'll add
the comment to hadoop-config.sh. See above for where to define the default and let me know
what you think.

Would be nice if we could have a unit test to verify the functionality is working going forward.
Maybe a unit test that writes out some app code in a jar, has RunJar run it with the client
classloader, and the app code verifies it has appropriate classpath semantics? Thinking something
along the lines of how TestApplicationClassloader works but verifying RunJar setup the classloaders

Let me look into a unit test for this that involves RunJar. Do you happen to know of an existing
test that writes out classes/jars off the top of your head?

Nit: Not thrilled to see that the variable just has to be defined to anything, although I
see HADOOP_USER_CLASSPATH_FIRST set a precedent for it. Leads to unexpected behavior if a
user sees something like HADOOP_USER_CLASSPATH_FIRST=true and tries HADOOP_USER_CLASSPATH_FIRST=false.
Not a must-fix, but it'd be nice to only accept expected values for the variable. A shell
func to sanity-check a boolean env would be helpful, maybe something to tackle in a followup

Yes I stumbled on that as well, and it struck me as an odd behavior. I think I'll file a separate
JIRA to tackle that issue...

> isolated classloader on the client side
> ---------------------------------------
>                 Key: HADOOP-10893
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10893
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 2.4.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: HADOOP-10893.patch, HADOOP-10893.patch, classloader-test.tar.gz
> We have the job classloader on the mapreduce tasks that run on the cluster. It has a
benefit of being able to isolate class space for user code and avoid version clashes.
> Although it occurs less often, version clashes do occur on the client JVM. It would be
good to introduce an isolated classloader on the client side as well to address this. A natural
point to introduce this may be through RunJar, as that's how most of hadoop jobs are run.

This message was sent by Atlassian JIRA

View raw message