spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jules Damji <dmat...@comcast.net>
Subject Re: a basic question on first use of PySpark shell and example, which is failing
Date Mon, 29 Feb 2016 06:07:17 GMT

Hello Ronald,

Since you have placed the file under HDFS, you might same change the path name to:

val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java")

Sent from my iPhone
Pardon the dumb thumb typos :)

> On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C <Ronald.Taylor@pnnl.gov> wrote:
> 
> 
> Hello folks,
> 
> I  am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at our lab.
I am trying to use the PySpark shell for the first time. and am attempting to  duplicate the
documentation example of creating an RDD  which I called "lines" using a text file.
> 
> I placed a a text file called Warehouse.java in this HDFS location:
> 
> [rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark
> -rw-r--r--   3 rtaylor supergroup    1155355 2016-02-28 18:09 /user/rtaylor/Spark/Warehouse.java
> [rtaylor@bigdatann ~]$ 
> 
> I then invoked sc.textFile()in the PySpark shell.That did not work. See below. Apparently
a class is not found? Don't know why that would be the case. Any guidance would be very much
appreciated.
> 
> The Cloudera Manager for the cluster says that Spark is operating  in the "green", for
whatever that is worth.
> 
>  - Ron Taylor
> 
> >>> lines = sc.textFile("file:///user/taylor/Spark/Warehouse.java")
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py",
line 451, in textFile
>     return RDD(self._jsc.textFile(name, minPartitions), self,
>   File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
>   File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py",
line 36, in deco
>     return f(*a, **kw)
>   File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile.
> : java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
>     at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
>     at org.apache.spark.SparkContext.textFile(SparkContext.scala:825)
>     at org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>     at py4j.Gateway.invoke(Gateway.java:259)
>     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>     at py4j.commands.CallCommand.execute(CallCommand.java:79)
>     at py4j.GatewayConnection.run(GatewayConnection.java:207)
>     at java.lang.Thread.run(Thread.java:745)
> 
> >>> 

Mime
View raw message