hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12747) support wildcard in libjars argument
Date Tue, 01 Mar 2016 23:39:18 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174633#comment-15174633

Sangjin Lee commented on HADOOP-12747:

That's very interesting. I missed the point that non-local jars are skipped only for adding
to the client's own classpath. JobResourceUploader separately parses libjars and does not
do the same filtering. Certainly since non-local libjars for the task is already supported,
we'd have to maintain that behavior for reasons of backwards compatibility.

I find the lack of consistency quite confusing. It's unclear to me how much of this behavior
is by design and how much is accidental. I assume the filtering away from the client's classpath
was done to avoid the complexity of needing to run some kind of "mini-localization" on the
client side to support non-local files.

Yes, that's what I thought as well. The inconsistency may be that {{URLClassLoader}} does
not support non-local paths by default, and we did not want the hassle of supporting them
on the client-side classpath.

Back to the original point, are you suggesting that we do allow wildcards for non-local paths
and do similar expansion? I can update the patch to do that. Let me know. Thanks!

> support wildcard in libjars argument
> ------------------------------------
>                 Key: HADOOP-12747
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12747
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch, HADOOP-12747.03.patch
> There is a problem when a user job adds too many dependency jars in their command line.
The HADOOP_CLASSPATH part can be addressed, including using wildcards (\*). But the same cannot
be done with the -libjars argument. Today it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this situation.
The idea is to handle it the same way the JVM does it: \* expands to the list of jars in that
directory. It does not traverse into any child directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't do it for
-files and -archives).

This message was sent by Atlassian JIRA

View raw message