Hi all,
Question from a newbie here about your excellent Spark:
I've just installed Spark 1.5.2, pre-built for Hadoop 2.4 and later. I'm
trying to go through the introductory documentation using local[4] to begin
with. In pyspark, I'm able to use examples such as the simple application
at PROVIDED that I remove the sc initialisation. Otherwise, if I try to run
any Python script using spark-submit, I get the verbose error message I show
below and no output. I am not able to fix this.
Any assistance would be very gratefully received.
My machine runs Windows 10 HOME, with 8GB ram on a 64 bit Intel Core i3-@
3.4 gHz. I'm using Python 2.7.11 under Anaconda 2.4.1.
Source, from
http://spark.apache.org/docs/latest/quick-start.html#self-contained-applications
=
from pyspark import SparkContext
logFile = "README.md" # Should be some file on your system
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
Error output =
Traceback (most recent call last):
File "c:/Users/Peter/spark-1.5.2-bin-hadoop2.4/SimpleApp.py", line 3, in
<module>
sc = SparkContext("local", "Simple App")
File
"c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\pyspark.zip\pyspark\context.py",
line 113, in __init__
File
"c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\pyspark.zip\pyspark\context.py",
line 170, in _do_init
File
"c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\pyspark.zip\pyspark\context.py",
line 224, in _initialize_context
File
"c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py",
line 701, in __call__
File
"c:\Users\Peter\spark-1.5.2-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:381)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1387)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1341)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:484)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:484)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:484)
at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Unknown Source)
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-5-2-prebuilt-for-2-4-spark-submit-standalone-Python-scripts-not-running-tp25804.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
|