spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick R. Katsipoulakis" <kat...@cs.pitt.edu>
Subject SPARK_CLASSPATH Warning
Date Wed, 09 Jul 2014 20:45:07 GMT
Hello,

I have installed Apache Spark v1.0.0 in a machine with a proprietary Hadoop
Distribution installed (v2.2.0 without yarn). Due to the fact that the
Hadoop Distribution that I am using, uses a list of jars , I do the
following changes to the conf/spark-env.sh

#!/usr/bin/env bash

export HADOOP_CONF_DIR=/path-to-hadoop-conf/hadoop-conf
export SPARK_LOCAL_IP=impl41
export
SPARK_CLASSPATH="/path-to-proprietary-hadoop-lib/lib/*:/path-to-proprietary-hadoop-lib/*"
...

Also, to make sure that I have everything working I execute the Spark shell
as follows:

[biadmin@impl41 spark]$ ./bin/spark-shell --jars
/path-to-proprietary-hadoop-lib/lib/*.jar

14/07/09 13:37:28 INFO spark.SecurityManager: Changing view acls to: biadmin
14/07/09 13:37:28 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(biadmin)
14/07/09 13:37:28 INFO spark.HttpServer: Starting HTTP Server
14/07/09 13:37:29 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/09 13:37:29 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:44292
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.0.0
      /_/

Using Scala version 2.10.4 (IBM J9 VM, Java 1.7.0)
Type in expressions to have them evaluated.
Type :help for more information.
14/07/09 13:37:36 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to
'path-to-proprietary-hadoop-lib/*:/path-to-proprietary-hadoop-lib/lib/*').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath

14/07/09 13:37:36 WARN spark.SparkConf: Setting
'spark.executor.extraClassPath' to
'/path-to-proprietary-hadoop-lib/lib/*:/path-to-proprietary-hadoop-lib/*'
as a work-around.
14/07/09 13:37:36 WARN spark.SparkConf: Setting
'spark.driver.extraClassPath' to
'/path-to-proprietary-hadoop-lib/lib/*:/path-to-proprietary-hadoop-lib/*'
as a work-around.
14/07/09 13:37:36 INFO spark.SecurityManager: Changing view acls to: biadmin
14/07/09 13:37:36 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(biadmin)
14/07/09 13:37:37 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/07/09 13:37:37 INFO Remoting: Starting remoting
14/07/09 13:37:37 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@impl41:46081]
14/07/09 13:37:37 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@impl41:46081]
14/07/09 13:37:37 INFO spark.SparkEnv: Registering MapOutputTracker
14/07/09 13:37:37 INFO spark.SparkEnv: Registering BlockManagerMaster
14/07/09 13:37:37 INFO storage.DiskBlockManager: Created local directory at
/tmp/spark-local-20140709133737-798b
14/07/09 13:37:37 INFO storage.MemoryStore: MemoryStore started with
capacity 307.2 MB.
14/07/09 13:37:38 INFO network.ConnectionManager: Bound socket to port
16685 with id = ConnectionManagerId(impl41,16685)
14/07/09 13:37:38 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/07/09 13:37:38 INFO storage.BlockManagerInfo: Registering block manager
impl41:16685 with 307.2 MB RAM
14/07/09 13:37:38 INFO storage.BlockManagerMaster: Registered BlockManager
14/07/09 13:37:38 INFO spark.HttpServer: Starting HTTP Server
14/07/09 13:37:38 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/09 13:37:38 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:21938
14/07/09 13:37:38 INFO broadcast.HttpBroadcast: Broadcast server started at
http://impl41:21938
14/07/09 13:37:38 INFO spark.HttpFileServer: HTTP File server directory is
/tmp/spark-91e8e040-f2ca-43dd-b574-805033f476c7
14/07/09 13:37:38 INFO spark.HttpServer: Starting HTTP Server
14/07/09 13:37:38 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/09 13:37:38 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:52678
14/07/09 13:37:38 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/09 13:37:38 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
14/07/09 13:37:38 INFO ui.SparkUI: Started SparkUI at http://impl41:4040
14/07/09 13:37:39 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/07/09 13:37:39 INFO spark.SparkContext: Added JAR
file:/opt/ibm/biginsights/IHC/lib/adaptive-mr.jar at
http://impl41:52678/jars/adaptive-mr.jar with timestamp 1404938259526
14/07/09 13:37:39 INFO executor.Executor: Using REPL class URI:
http://impl41:44292
14/07/09 13:37:39 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.

scala>

So, my question is the following:

Am I including my libraries correctly? Why do I get the message that the
SPARK_CLASSPATH method is deprecated?

Also, when I execute the following example:

scala> val file = sc.textFile("hdfs://lpsa.dat")
14/07/09 13:41:43 WARN util.SizeEstimator: Failed to check whether
UseCompressedOops is set; assuming yes
14/07/09 13:41:43 INFO storage.MemoryStore: ensureFreeSpace(102907) called
with curMem=0, maxMem=322122547
14/07/09 13:41:43 INFO storage.MemoryStore: Block broadcast_0 stored as
values to memory (estimated size 100.5 KB, free 307.1 MB)
file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
<console>:12

scala> val errors = file.filter(line => line.contains("ERROR"))
errors: org.apache.spark.rdd.RDD[String] = FilteredRDD[2] at filter at
<console>:14

scala> errors.count()
14/07/09 13:42:11 WARN hdfs.BlockReaderLocal: The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.
java.lang.IllegalArgumentException: java.net.UnknownHostException: lpsa.dat
    at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
    at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:231)
    at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
    at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2442)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2476)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2458)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:376)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)
    at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:172)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
    at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
    at org.apache.spark.rdd.FilteredRDD.getPartitions(FilteredRDD.scala:29)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1094)
    at org.apache.spark.rdd.RDD.count(RDD.scala:847)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:17)
    at $iwC$$iwC$$iwC.<init>(<console>:22)
    at $iwC$$iwC.<init>(<console>:24)
    at $iwC.<init>(<console>:26)
    at <init>(<console>:28)
    at .<init>(<console>:32)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    at java.lang.reflect.Method.invoke(Method.java:619)
    at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
    at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
    at
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
    at
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
    at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
    at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
    at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
    at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
    at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
    at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    at java.lang.reflect.Method.invoke(Method.java:619)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: lpsa.dat
    ... 71 more


scala>

Why do I get this UnknownHostException on the file and what does the
following mesage mean:

"hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be
used because libhadoop cannot be loaded"

I apologize for the long message but I searched previous messages and I can
not figure out what I am doing wrong.

Thank you,
Nick

Mime
View raw message