spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taylor, Ronald C" <Ronald.Tay...@pnnl.gov>
Subject a basic question on first use of PySpark shell and example, which is failing
Date Mon, 29 Feb 2016 05:36:49 GMT

Hello folks,

I  am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at our lab. I am
trying to use the PySpark shell for the first time. and am attempting to  duplicate the documentation
example of creating an RDD  which I called "lines" using a text file.

I placed a a text file called Warehouse.java in this HDFS location:

[rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark
-rw-r--r--   3 rtaylor supergroup    1155355 2016-02-28 18:09 /user/rtaylor/Spark/Warehouse.java
[rtaylor@bigdatann ~]$

I then invoked sc.textFile()in the PySpark shell.That did not work. See below. Apparently
a class is not found? Don't know why that would be the case. Any guidance would be very much
appreciated.

The Cloudera Manager for the cluster says that Spark is operating  in the "green", for whatever
that is worth.

 - Ron Taylor

>>> lines = sc.textFile("file:///user/taylor/Spark/Warehouse.java")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py",
line 451, in textFile
    return RDD(self._jsc.textFile(name, minPartitions), self,
  File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py",
line 36, in deco
    return f(*a, **kw)
  File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
    at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
    at org.apache.spark.SparkContext.textFile(SparkContext.scala:825)
    at org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    at py4j.Gateway.invoke(Gateway.java:259)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:207)
    at java.lang.Thread.run(Thread.java:745)

>>>

Mime
View raw message