spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Bridgett <adr...@opensignal.com>
Subject Re: JNI issues with mesos
Date Wed, 09 Sep 2015 18:33:22 GMT
Thanks Tim,

There's a little more to it in fact - if I use the 
pre-built-with-hadoop-2.6 binaries, all is good (with correctly named 
tarballs in hdfs).   Using the pre-built with user-provided hadoop 
(including setting SPARK_DIST_CLASSPATH in setup-env.sh) then I get the 
JNI exception.

Aha - I've found the minimal set of changes that fixes it.  I can use 
the user-provided hadoop tarballs, but I _have_ to add spark-env.sh to 
them (which I wasn't expecting - I don't recall seeing this anywhere in 
the docs so I was expecting everything was setup by spark/mesos from the 
client config).

FWIW, spark-env.sh:
export SPARK_DIST_CLASSPATH=$(/opt/hadoop/bin/hadoop classpath)
#export MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos.so
export SPARK_EXECUTOR_URI=hdfs:///apps/spark/spark15.tgz

Leaving out SPARK_DIST_CLASSPATH leads to 
org.apache.hadoop.fs.FSDataInputStream class errors (as you'd expect).
Leaving out MESOS_NATIVE_JAVA_LIBRARY seems to have no consequences ATM 
(it is set in the client).
Leaving out SPARK_EXECUTOR_URI stops the job starting at all.

spark-defaults.conf isn't required to be in the tarball, on the client 
it's set to:
spark.master 
mesos://zk://mesos-1.example.net:2181,mesos-2.example.net:2181,mesos-3.example.net:2181/mesos
spark.executor.uri hdfs:///apps/spark/spark15.tgz

I guess this is the way forward for us right now, bit uncomfortable as I 
like to understand why :-)

On 09/09/2015 18:43, Tim Chen wrote:
> Hi Adrian,
>
> Spark is expecting a specific naming of the tgz and also the folder 
> name inside, as this is generated by running make-distribution.sh 
> --tgz in the Spark source folder.
>
> If you use a Spark 1.4 tgz generated with that script with the same 
> name and upload to HDFS again, fix the URI then it should work.
>
> Tim
>
> On Wed, Sep 9, 2015 at 8:18 AM, Adrian Bridgett <adrian@opensignal.com 
> <mailto:adrian@opensignal.com>> wrote:
>
>     5mins later...
>
>     Trying 1.5 with a fairly plain build:
>     ./make-distribution.sh --tgz --name os1 -Phadoop-2.6
>
>     and on my first attempt stderr showed:
>     I0909 15:16:49.392144  1619 fetcher.cpp:441] Fetched
>     'hdfs:///apps/spark/spark15.tgz' to
>     '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S1/frameworks/20150826-133446-3217621258-5050-4064-211204/executors/20150826-133446-3217621258-5050-4064-S1/runs/43026ba8-6624-4817-912c-3d7573433102/spark15.tgz'
>     sh: 1: cd: can't cd to spark15.tgz
>     sh: 1: ./bin/spark-class: not found
>
>     Aha, let's rename the file in hdfs (and the two configs) from
>     spark15.tgz to spark-1.5.0-bin-os1.tgz...
>     Success!!!
>
>     The same trick with 1.4 doesn't work, but now that I have
>     something that does I can make progress.
>
>     Hopefully this helps someone else :-)
>
>     Adrian
>
>
>     On 09/09/2015 16:59, Adrian Bridgett wrote:
>>     I'm trying to run spark (1.4.1) on top of mesos (0.23).  I've
>>     followed the instructions (uploaded spark tarball to HDFS, set
>>     executor uri in both places etc) and yet on the slaves it's
>>     failing to lauch even the SparkPi example with a JNI error.  It
>>     does run with a local master.  A day of debugging later and it's
>>     time to ask for help!
>>
>>      bin/spark-submit --master mesos://10.1.201.191:5050
>>     <http://10.1.201.191:5050> --class
>>     org.apache.spark.examples.SparkPi /tmp/examples.jar
>>
>>     (I'm putting the jar outside hdfs  - on both client box + slave
>>     (turned off other slaves for debugging) - due to
>>     http://apache-spark-user-list.1001560.n3.nabble.com/Remote-jar-file-td20649.html.
>>     I should note that I had the same JNI errors when using the mesos
>>     cluster dispatcher).
>>
>>     I'm using Oracle Java 8 (no other java - even openjdk - is installed)
>>
>>     As you can see, the slave is downloading the framework fine (you
>>     can even see it extracted on the slave).  Can anyone shed some
>>     light on what's going on - e.g. how is it attempting to run the
>>     executor?
>>
>>     I'm going to try a different JVM (and try a custom spark
>>     distribution) but I suspect that the problem is much more basic.
>>     Maybe it can't find the hadoop native libs?
>>
>>     Any light would be much appreciated :)  I've included the
>>     slaves's stderr below:
>>
>>     I0909 14:14:01.405185 32132 logging.cpp:177] Logging to STDERR
>>     I0909 14:14:01.405256 32132 fetcher.cpp:409] Fetcher Info:
>>     {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150826-133446-3217621258-5050-4064-S0\/ubuntu","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/\/apps\/spark\/spark.tgz"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20150826-133446-3217621258-5050-4064-S0\/frameworks\/20150826-133446-3217621258-5050-4064-211198\/executors\/20150826-133446-3217621258-5050-4064-S0\/runs\/38077da2-553e-4888-bfa3-ece2ab2119f3","user":"ubuntu"}
>>     I0909 14:14:01.406332 32132 fetcher.cpp:364] Fetching URI
>>     'hdfs:///apps/spark/spark.tgz'
>>     I0909 14:14:01.406344 32132 fetcher.cpp:238] Fetching directly
>>     into the sandbox directory
>>     I0909 14:14:01.406358 32132 fetcher.cpp:176] Fetching URI
>>     'hdfs:///apps/spark/spark.tgz'
>>     I0909 14:14:01.679055 32132 fetcher.cpp:104] Downloading resource
>>     with Hadoop client from 'hdfs:///apps/spark/spark.tgz' to
>>     '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
>>     I0909 14:14:05.492626 32132 fetcher.cpp:76] Extracting with
>>     command: tar -C
>>     '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
>>     -xf
>>     '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
>>     I0909 14:14:07.489753 32132 fetcher.cpp:84] Extracted
>>     '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
>>     into
>>     '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3'
>>     W0909 14:14:07.489784 32132 fetcher.cpp:260] Copying instead of
>>     extracting resource from URI with 'extract' flag, because it does
>>     not seem to be an archive: hdfs:///apps/spark/spark.tgz
>>     I0909 14:14:07.489791 32132 fetcher.cpp:441] Fetched
>>     'hdfs:///apps/spark/spark.tgz' to
>>     '/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S0/frameworks/20150826-133446-3217621258-5050-4064-211198/executors/20150826-133446-3217621258-5050-4064-S0/runs/38077da2-553e-4888-bfa3-ece2ab2119f3/spark.tgz'
>>     Error: A JNI error has occurred, please check your installation
>>     and try again
>>     Exception in thread "main" java.lang.NoClassDefFoundError:
>>     org/slf4j/Logger
>>         at java.lang.Class.getDeclaredMethods0(Native Method)
>>         at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>>         at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>>         at java.lang.Class.getMethod0(Class.java:3018)
>>         at java.lang.Class.getMethod(Class.java:1784)
>>         at
>>     sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
>>         at
>>     sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
>>     Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>         ... 7 more
>>
>>
>
>     -- 
>     *Adrian Bridgett* |  Sysadmin Engineer, OpenSignal
>     <http://www.opensignal.com>
>     _____________________________________________________
>     Office: First Floor, Scriptor Court, 155-157 Farringdon Road,
>     Clerkenwell, London, EC1R 3AD
>     Phone #: +44 777-377-8251
>     Skype: abridgett  |@adrianbridgett
>     <http://twitter.com/adrianbridgett>| LinkedIn link
>     <https://uk.linkedin.com/in/abridgett>
>     _____________________________________________________
>
>

-- 
*Adrian Bridgett* |  Sysadmin Engineer, OpenSignal 
<http://www.opensignal.com>
_____________________________________________________
Office: First Floor, Scriptor Court, 155-157 Farringdon Road, 
Clerkenwell, London, EC1R 3AD
Phone #: +44 777-377-8251
Skype: abridgett  |@adrianbridgett <http://twitter.com/adrianbridgett>| 
LinkedIn link <https://uk.linkedin.com/in/abridgett>
_____________________________________________________

Mime
View raw message