spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Chen <...@mesosphere.io>
Subject Re: JNI issues with mesos
Date Wed, 09 Sep 2015 16:43:58 GMT
Hi Adrian,

Spark is expecting a specific naming of the tgz and also the folder name
inside, as this is generated by running make-distribution.sh --tgz in the
Spark source folder.

If you use a Spark 1.4 tgz generated with that script with the same name
and upload to HDFS again, fix the URI then it should work.

Tim

On Wed, Sep 9, 2015 at 8:18 AM, Adrian Bridgett <adrian@opensignal.com>
wrote:

> 5mins later...
>
> Trying 1.5 with a fairly plain build:
> ./make-distribution.sh --tgz --name os1 -Phadoop-2.6
>
> and on my first attempt stderr showed:
> I0909 15:16:49.392144  1619 fetcher.cpp:441] Fetched
> 'hdfs:///apps/spark/spark15.tgz' to
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S1/frameworks/20150826-133446-3217621258-5050-4064-211204/executors/20150826-133446-3217621258-5050-4064-S1/runs/43026ba8-6624-4817-912c-3d7573433102/spark15.tgz'
> sh: 1: cd: can't cd to spark15.tgz
> sh: 1: ./bin/spark-class: not found
>
> Aha, let's rename the file in hdfs (and the two configs) from spark15.tgz
> to spark-1.5.0-bin-os1.tgz...
> Success!!!
>
> The same trick with 1.4 doesn't work, but now that I have something that
> does I can make progress.
>
> Hopefully this helps someone else :-)
>
> Adrian
>
>
> On 09/09/2015 16:59, Adrian Bridgett wrote:
>
> I'm trying to run spark (1.4.1) on top of mesos (0.23).  I've followed the
> instructions (uploaded spark tarball to HDFS, set executor uri in both
> places etc) and yet on the slaves it's failing to lauch even the SparkPi
> example with a JNI error.  It does run with a local master.  A day of
> debugging later and it's time to ask for help!
>
>  bin/spark-submit --master mesos://10.1.201.191:5050 --class
> org.apache.spark.examples.SparkPi /tmp/examples.jar
>
> (I'm putting the jar outside hdfs  - on both client box + slave (turned
> off other slaves for debugging) - due to
> <http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-td20649.html>
> http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-td20649.html.
> I should note that I had the same JNI errors when using the mesos cluster
> dispatcher).
>
> I'm using Oracle Java 8 (no other java - even openjdk - is installed)
>
> As you can see, the slave is downloading the framework fine (you can even
> see it extracted on the slave).  Can anyone shed some light on what's going
> on - e.g. how is it attempting to run the executor?
>
> I'm going to try a different JVM (and try a custom spark distribution) but
> I suspect that the problem is much more basic. Maybe it can't find the
> hadoop native libs?
>
> Any light would be much appreciated :)  I've included the slaves's stderr
> below:
>
> I0909 14:14:01.405185 32132 logging.cpp:177] Logging to STDERR
> I0909 14:14:01.405256 32132 fetcher.cpp:409] Fetcher Info:
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150826-133446-3217621258-5050-4064-S0\/ubuntu","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/\/apps\/spark\/spark.tgz"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20150826-133446-3217621258-5050-4064-S0\/frameworks\/20150826-133446-3217621258-5050-4064-211198\/executors\/20150826-133446-3217621258-5050-4064-S0\/runs\/38077da2-553e-4888-bfa3-ece2ab2119f3","user":"ubuntu"}
> I0909 14:14:01.406332 32132 fetcher.cpp:364] Fetching URI
> 'hdfs:///apps/spark/spark.tgz'
> I0909 14:14:01.406344 32132 fetcher.cpp:238] Fetching directly into the
> sandbox directory
> I0909 14:14:01.406358 32132 fetcher.cpp:176] Fetching URI
> 'hdfs:///apps/spark/spark.tgz'
> I0909 14:14:01.679055 32132 fetcher.cpp:104] Downloading resource with
> Hadoop client from 'hdfs:///apps/spark/spark.tgz' to
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
> I0909 14:14:05.492626 32132 fetcher.cpp:76] Extracting with command: tar
> -C
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
> -xf
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
> I0909 14:14:07.489753 32132 fetcher.cpp:84] Extracted
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
> into
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
> W0909 14:14:07.489784 32132 fetcher.cpp:260] Copying instead of extracting
> resource from URI with 'extract' flag, because it does not seem to be an
> archive: hdfs:///apps/spark/spark.tgz
> I0909 14:14:07.489791 32132 fetcher.cpp:441] Fetched
> 'hdfs:///apps/spark/spark.tgz' to
> '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
> Error: A JNI error has occurred, please check your installation and try
> again
> Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
>     at java.lang.Class.getDeclaredMethods0(Native Method)
>     at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>     at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>     at java.lang.Class.getMethod0(Class.java:3018)
>     at java.lang.Class.getMethod(Class.java:1784)
>     at
> sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
>     at
> sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 7 more
>
>
>
> --
> *Adrian Bridgett* |  Sysadmin Engineer, OpenSignal
> <http://www.opensignal.com>
> _____________________________________________________
> Office: First Floor, Scriptor Court, 155-157 Farringdon Road, Clerkenwell,
> London, EC1R 3AD
> Phone #: +44 777-377-8251
> Skype: abridgett  |  @adrianbridgett <http://twitter.com/adrianbridgett>  |
>  LinkedIn link  <https://uk.linkedin.com/in/abridgett>
> _____________________________________________________
>

Mime
View raw message