storm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kabh...@apache.org
Subject [1/5] storm git commit: [STORM-2191] shorten classpaths by using wildcards
Date Mon, 08 May 2017 06:39:51 GMT
Repository: storm
Updated Branches:
  refs/heads/1.x-branch 95ec555e8 -> 31e75bf5b


[STORM-2191] shorten classpaths by using wildcards

This commit resolves cherry-pick conflicts for backporting change from storm master branch.

Related commit in master: 2ceef0dce21895c4ef0f0b8eb5bc8248fad25e13

Instead of fully enumerating all JARs in the lib directories, we just use
a Java classpath wildcard to allow the JVM to autodiscover all JARs in the
classpath.  This affects both the Worker and LogWriter processes, as well
as Storm daemons such as the Nimbus, UI, Logviewer, and Supervisor.

This change results in shorter commands, so that you can actually see
the full content of the Worker command in `ps` output on Linux.

Prior to this change Worker commands were easily longer than 4096 bytes,
which is the default Linux kernel limit for commands being recorded into
the process table.  Longer commands get truncated, though they do get
executed.

An example of the change in Worker classpath length can be seen here:

Before:
```
-cp STORM_DIR/lib-worker/asm-5.0.3.jar:STORM_DIR/lib-worker/carbonite-1.5.0.jar:STORM_DIR/lib-worker/chill-java-0.8.0.jar:STORM_DIR/lib-worker/clojure-1.7.0.jar:STORM_DIR/lib-worker/commons-codec-1.6.jar:STORM_DIR/lib-worker/commons-collections-3.2.2.jar:STORM_DIR/lib-worker/commons-io-2.5.jar:STORM_DIR/lib-worker/commons-lang-2.5.jar:STORM_DIR/lib-worker/commons-logging-1.1.3.jar:STORM_DIR/lib-worker/curator-client-2.12.0.jar:STORM_DIR/lib-worker/curator-framework-2.12.0.jar:STORM_DIR/lib-worker/disruptor-3.3.2.jar:STORM_DIR/lib-worker/guava-16.0.1.jar:STORM_DIR/lib-worker/httpclient-4.3.3.jar:STORM_DIR/lib-worker/httpcore-4.4.1.jar:STORM_DIR/lib-worker/jgrapht-core-0.9.0.jar:STORM_DIR/lib-worker/jline-0.9.94.jar:STORM_DIR/lib-worker/json-simple-1.1.jar:STORM_DIR/lib-worker/kryo-3.0.3.jar:STORM_DIR/lib-worker/kryo-shaded-3.0.3.jar:STORM_DIR/lib-worker/libthrift-0.9.3.jar:STORM_DIR/lib-worker/log4j-api-2.8.jar:STORM_DIR/lib-worker/log4j-core-2.8.jar:STORM_DIR/lib-worker/log4j-over-sl
 f4j-1.6.6.jar:STORM_DIR/lib-worker/log4j-slf4j-impl-2.8.jar:STORM_DIR/lib-worker/minlog-1.3.0.jar:STORM_DIR/lib-worker/netty-3.9.0.Final.jar:STORM_DIR/lib-worker/objenesis-2.1.jar:STORM_DIR/lib-worker/reflectasm-1.10.1.jar:STORM_DIR/lib-worker/servlet-api-2.5.jar:STORM_DIR/lib-worker/slf4j-api-1.7.21.jar:STORM_DIR/lib-worker/snakeyaml-1.11.jar:STORM_DIR/lib-worker/storm-client-2.0.0-SNAPSHOT.jar:STORM_DIR/lib-worker/sysout-over-slf4j-1.0.2.jar:STORM_DIR/lib-worker/zookeeper-3.4.6.jar:STORM_DIR/conf:STORM_DIR/storm-local/supervisor/stormdist/foo-topology-1-1-1493359573/stormjar.jar
```

After:
```
-cp STORM_DIR/lib-worker/*:STORM_DIR/extlib/*:STORM_DIR/conf:STORM_DIR/storm-local/supervisor/stormdist/foo-topology-1-1-1493359573/stormjar.jar
```

This change also includes additional documentation about the use of classpaths in
Storm and provides some guidance for using the various features for using external
libraries.

For more details on this problem and a discussion about this solution's
merits, please see [STORM-2191](https://issues.apache.org/jira/browse/STORM-2191).


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/9767255b
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/9767255b
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/9767255b

Branch: refs/heads/1.x-branch
Commit: 9767255b2627119f14f930f6b39fe5d5e7fd663d
Parents: 95ec555
Author: Erik Weathers <erikdw@gmail.com>
Authored: Fri Apr 28 00:50:18 2017 -0700
Committer: Erik Weathers <erikdw@gmail.com>
Committed: Sat May 6 22:53:11 2017 -0700

----------------------------------------------------------------------
 bin/storm.py                                    | 35 +++++++++-----------
 docs/Classpath-handling.md                      | 29 ++++++++++++++++
 docs/Setting-up-a-Storm-cluster.md              |  4 +--
 .../storm/daemon/supervisor/BasicContainer.java | 25 +++++---------
 4 files changed, 55 insertions(+), 38 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/9767255b/bin/storm.py
----------------------------------------------------------------------
diff --git a/bin/storm.py b/bin/storm.py
index 108e573..462dbda 100755
--- a/bin/storm.py
+++ b/bin/storm.py
@@ -112,31 +112,26 @@ if not os.path.exists(STORM_LIB_DIR):
     print("******************************************")
     sys.exit(1)
 
-def get_jars_full(adir):
-    files = []
-    if os.path.isdir(adir):
-        files = os.listdir(adir)
-    elif os.path.exists(adir):
-        files = [adir]
-
-    ret = []
-    for f in files:
-        if f.endswith(".jar"):
-            ret.append(os.path.join(adir, f))
+# If given path is a dir, make it a wildcard so the JVM will include all JARs in the directory.
+def get_wildcard_dir(path):
+    if os.path.isdir(path):
+        ret = [(os.path.join(path, "*"))]
+    elif os.path.exists(path):
+        ret = [path]
     return ret
 
 def get_classpath(extrajars, daemon=True):
-    ret = get_jars_full(STORM_DIR)
-    ret.extend(get_jars_full(STORM_DIR + "/lib"))
-    ret.extend(get_jars_full(STORM_DIR + "/extlib"))
+    ret = get_wildcard_dir(STORM_DIR)
+    ret.extend(get_wildcard_dir(STORM_DIR + "/lib"))
+    ret.extend(get_wildcard_dir(STORM_DIR + "/extlib"))
     if daemon:
-        ret.extend(get_jars_full(STORM_DIR + "/extlib-daemon"))
+        ret.extend(get_wildcard_dir(STORM_DIR + "/extlib-daemon"))
     if STORM_EXT_CLASSPATH != None:
         for path in STORM_EXT_CLASSPATH.split(os.pathsep):
-            ret.extend(get_jars_full(path))
+            ret.extend(get_wildcard_dir(path))
     if daemon and STORM_EXT_CLASSPATH_DAEMON != None:
         for path in STORM_EXT_CLASSPATH_DAEMON.split(os.pathsep):
-            ret.extend(get_jars_full(path))
+            ret.extend(get_wildcard_dir(path))
     ret.extend(extrajars)
     return normclasspath(os.pathsep.join(ret))
 
@@ -168,7 +163,7 @@ def resolve_dependencies(artifacts, artifact_repositories):
     # TODO: should we move some external modules to outer place?
 
     # storm-submit module doesn't rely on storm-core and relevant libs
-    extrajars = get_jars_full(STORM_DIR + "/external/storm-submit-tools")
+    extrajars = get_wildcard_dir(STORM_DIR + "/external/storm-submit-tools")
     classpath = normclasspath(os.pathsep.join(extrajars))
 
     command = [
@@ -341,8 +336,8 @@ def sql(sql_file, topology_name):
     local_jars = DEP_JARS_OPTS
     artifact_to_file_jars = resolve_dependencies(DEP_ARTIFACTS_OPTS, DEP_ARTIFACTS_REPOSITORIES_OPTS)
 
-    sql_core_jars = get_jars_full(STORM_DIR + "/external/sql/storm-sql-core")
-    sql_runtime_jars = get_jars_full(STORM_DIR + "/external/sql/storm-sql-runtime")
+    sql_core_jars = get_wildcard_dir(STORM_DIR + "/external/sql/storm-sql-core")
+    sql_runtime_jars = get_wildcard_dir(STORM_DIR + "/external/sql/storm-sql-runtime")
 
     # include storm-sql-runtime jar(s) to local jar list
     local_jars.extend(sql_runtime_jars)

http://git-wip-us.apache.org/repos/asf/storm/blob/9767255b/docs/Classpath-handling.md
----------------------------------------------------------------------
diff --git a/docs/Classpath-handling.md b/docs/Classpath-handling.md
new file mode 100644
index 0000000..d48517e
--- /dev/null
+++ b/docs/Classpath-handling.md
@@ -0,0 +1,29 @@
+---
+title: Classpath Handling
+layout: documentation
+documentation: true
+---
+### Storm is an Application Container
+
+Storm provides an application container environment, a la Apache Tomcat, which creates potential
for classpath conflicts between Storm and your application.  The most common way of using
Storm involves submitting an "uber JAR" containing your application code with all of its dependencies
bundled in, and then Storm distributes this JAR to Worker nodes.  Then Storm runs your application
within a Storm process called a `Worker` -- thus the JVM's classpath contains the dependencies
of your JAR as well as whatever dependencies the Worker itself has.  So careful handling of
classpaths and dependencies is critical for the correct functioning of Storm.
+
+### Adding Extra Dependencies to Classpath
+
+You no longer *need* to bundle your dependencies into your topology and create an uber JAR,
there are now facilities for separately handling your topology's dependencies.  Furthermore,
there are facilities for adding external dependencies into the Storm daemons.
+
+The `storm.py` launcher script allows you to include dependencies into the launched program's
classpath via a few different mechanisms:
+
+1. The `--jar` and `--artifacts` options for the `storm jar` command: allow inclusion of
non-bundled dependencies with your topology; i.e., allowing specification of JARs that were
not bundled into the topology uber-jar.  This is required when using the `storm sql` command,
which constructs a topology automatically without needing you to write code and build a topology
JAR.
+2. The `${STORM_DIR}/extlib/` and `${STORM_DIR}/extlib-daemon/` directories can have dependencies
added to them for inclusion of plugins & 3rd-party libraries into the Storm daemons (e.g.,
Nimbus, UI, Supervisor, etc. -- use `extlib-daemon/`) and other commands launched via the
`storm.py` script, e.g., `storm sql` and `storm jar` (use `extlib`). Notably, this means that
the Storm Worker process does not include the `extlib-daemon/` directory into its classpath.
+3. The `STORM_EXT_CLASSPATH` and `STORM_EXT_CLASSPATH_DAEMON` environment variables provide
a similar functionality as those directories, but allows the user to place their external
dependencies in alternative locations.
+ * There is a wrinkle here: because the Supervisor daemon launches the Worker process, if
you want `STORM_EXT_CLASSPATH` to impact your Workers, you will need to specify the `STORM_EXT_CLASSPATH`
for the Supervisor daemon.  That will allow the Supervisor to consult this environment variable
as it constructs the classpath of the Worker processes.
+
+#### Which Facility to Choose?
+
+You might have noticed the overlap between the first mechanism and the others. If you consider
the `--jar` / `--artifacts` option versus the `extlib/` / `STORM_EXT_CLASSPATH` it is not
obvious which one you should choose for using dependencies with your Worker processes. i.e.,
both mechanisms allow including JARs to be used for running your Worker processes. Here is
my understanding of the difference: `--jar` / `--artifacts` will result in the dependencies
being used for running the `storm jar/sql` command, *and* the dependencies will be uploaded
and available in the classpath of the topology's `Worker` processes. Whereas the use of `extlib/`
/ `STORM_EXT_CLASSPATH` requires you to have distributed your JAR dependencies out to all
Worker nodes.  Another difference is that `extlib/` / `STORM_EXT_CLASSPATH` would impact all
topologies, whereas `--jar` / `--artifacts` is a topology-specific option.
+
+### Abbreviation of Classpaths and Process Commands
+
+When the `storm.py` script launches a `java` command, it first constructs the classpath from
the optional settings mentioned above, as well as including some default locations such as
the `${STORM_DIR}/`, `${STORM_DIR}/lib/`, `${STORM_DIR}/extlib/` and `${STORM_DIR}/extlib-daemon/`
directories.  In past releases, Storm would enumerate all JARs in those directories and then
explicitly add all of those JARs into the `-cp` / `--classpath` argument to the launched `java`
commands.  As such, the classpath would get so long that the `java` commands could breach
the Linux Kernel process table limit of 4096 bytes for recording commands.  That led to truncated
commands in `ps` output, making it hard to operate Storm clusters because you could not easily
differentiate the processes nor easily see from `ps` which port a worker is listening to.
+
+After Storm dropped support for Java 5, this classpath expansion was no longer necessary,
because Java 6 supports classpath wildcards. Classpath wildcards allow you to specify a directory
ending with a `*` element, such as `foo/bar/*`, and the JVM will automatically expand the
classpath to include all `.jar` files in the wildcard directory.  As of [STORM-2191](https://issues.apache.org/jira/browse/STORM-2191)
Storm just uses classpath wildcards instead of explicitly listing all JARs, thereby shortening
all of the commands and making operating Storm clusters a bit easier.

http://git-wip-us.apache.org/repos/asf/storm/blob/9767255b/docs/Setting-up-a-Storm-cluster.md
----------------------------------------------------------------------
diff --git a/docs/Setting-up-a-Storm-cluster.md b/docs/Setting-up-a-Storm-cluster.md
index a251d1a..56efc00 100644
--- a/docs/Setting-up-a-Storm-cluster.md
+++ b/docs/Setting-up-a-Storm-cluster.md
@@ -102,9 +102,9 @@ The time to allow any given healthcheck script to run before it is marked
failed
 storm.health.check.timeout.ms: 5000
 ```
 
-### Configure external libraries and environmental variables (optional)
+### Configure external libraries and environment variables (optional)
 
-If you need support from external libraries or custom plugins, you can place such jars into
the extlib/ and extlib-daemon/ directories. Note that the extlib-daemon/ directory stores
jars used only by daemons (Nimbus, Supervisor, DRPC, UI, Logviewer), e.g., HDFS and customized
scheduling libraries. Accordingly, two environmental variables STORM_EXT_CLASSPATH and STORM_EXT_CLASSPATH_DAEMON
can be configured by users for including the external classpath and daemon-only external classpath.
+If you need support from external libraries or custom plugins, you can place such jars into
the extlib/ and extlib-daemon/ directories. Note that the extlib-daemon/ directory stores
jars used only by daemons (Nimbus, Supervisor, DRPC, UI, Logviewer), e.g., HDFS and customized
scheduling libraries. Accordingly, two environment variables STORM_EXT_CLASSPATH and STORM_EXT_CLASSPATH_DAEMON
can be configured by users for including the external classpath and daemon-only external classpath.
See [Classpath handling](Classpath-handling.html)] for more details on using external libraries.
 
 
 ### Launch daemons under supervision using "storm" script and a supervisor of your choice

http://git-wip-us.apache.org/repos/asf/storm/blob/9767255b/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java
----------------------------------------------------------------------
diff --git a/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java b/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java
index daa1d00..4e4f022 100644
--- a/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java
+++ b/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java
@@ -20,6 +20,7 @@ package org.apache.storm.daemon.supervisor;
 import java.io.File;
 import java.io.FilenameFilter;
 import java.io.IOException;
+import java.nio.file.Paths;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
@@ -329,21 +330,13 @@ public class BasicContainer extends Container {
     }
 
     /**
-     * Returns a collection of jar file names found under the given directory.
-     * @param dir the directory to search
-     * @return the jar file names
+     * Returns a path with a wildcard as the final element, so that the JVM will expand
+     * that to all JARs in the directory.
+     * @param dir the directory to which a wildcard will be appended
+     * @return the path with wildcard ("*") suffix
      */
-    protected List<String> getFullJars(File dir) {
-        File[] files = dir.listFiles(jarFilter);
-
-        if (files == null) {
-            return Collections.emptyList();
-        }
-        ArrayList<String> ret = new ArrayList<>(files.length);
-        for (File f: files) {
-            ret.add(f.getAbsolutePath());
-        }
-        return ret;
+    protected String getWildcardDir(File dir) {
+        return Paths.get(dir.toString(), "*").toString();
     }
     
     protected List<String> frameworkClasspath() {
@@ -355,8 +348,8 @@ public class BasicContainer extends Container {
         File stormExtlibDir = new File(_stormHome, "extlib");
         String extcp = System.getenv("STORM_EXT_CLASSPATH");
         List<String> pathElements = new LinkedList<>();
-        pathElements.addAll(getFullJars(stormLibDir));
-        pathElements.addAll(getFullJars(stormExtlibDir));
+        pathElements.add(getWildcardDir(stormLibDir));
+        pathElements.add(getWildcardDir(stormExtlibDir));
         pathElements.add(extcp);
         pathElements.add(stormConfDir);
 


Mime
View raw message