spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohamed Lrhazi <Mohamed.Lrh...@georgetown.edu>
Subject Re: Problem getting pyspark-cassandra and pyspark working
Date Mon, 16 Feb 2015 22:46:27 GMT
Oh, I don't know. thanks a lot Davies, gonna figure that out now....

On Mon, Feb 16, 2015 at 5:31 PM, Davies Liu <davies@databricks.com> wrote:

> It also need the Cassandra jar:
> com.datastax.spark.connector.CassandraJavaUtil
>
> Is it included in  /spark/pyspark-cassandra-0.1-SNAPSHOT.jar ?
>
>
>
> On Mon, Feb 16, 2015 at 1:20 PM, Mohamed Lrhazi
> <Mohamed.Lrhazi@georgetown.edu> wrote:
> > Yes, am sure the system cant find the jar.. but how do I fix that... my
> > submit command includes the jar:
> >
> > /spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars
> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path
> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py
> >
> > and the spark output seems to indicate it is handling it:
> >
> > 15/02/16 05:58:46 INFO SparkContext: Added JAR
> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at
> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with
> > timestamp 1424066326632
> >
> >
> > I don't really know what else I could try.... any suggestions highly
> > appreciated.
> >
> > Thanks,
> > Mohamed.
> >
> >
> > On Mon, Feb 16, 2015 at 4:04 PM, Davies Liu <davies@databricks.com>
> wrote:
> >>
> >> It seems that the jar for cassandra is not loaded, you should have
> >> them in the classpath.
> >>
> >> On Mon, Feb 16, 2015 at 12:08 PM, Mohamed Lrhazi
> >> <Mohamed.Lrhazi@georgetown.edu> wrote:
> >> > Hello all,
> >> >
> >> > Trying the example code from this package
> >> > (https://github.com/Parsely/pyspark-cassandra) , I always get this
> >> > error...
> >> >
> >> > Can you see what I am doing wrong? from googling arounf it seems to be
> >> > that
> >> > the jar is not found somehow...  The spark log shows the JAR was
> >> > processed
> >> > at least.
> >> >
> >> > Thank you so much.
> >> >
> >> > am using spark-1.2.1-bin-hadoop2.4.tgz
> >> >
> >> > test2.py is simply:
> >> >
> >> > from pyspark.context import SparkConf
> >> > from pyspark_cassandra import CassandraSparkContext, saveToCassandra
> >> > conf = SparkConf().setAppName("PySpark Cassandra Sample Driver")
> >> > conf.set("spark.cassandra.connection.host", "devzero")
> >> > sc = CassandraSparkContext(conf=conf)
> >> >
> >> > [root@devzero spark]# /usr/local/bin/docker-enter  spark-master bash
> -c
> >> > "/spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars
> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path
> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py"
> >> > ...
> >> > 15/02/16 05:58:45 INFO Slf4jLogger: Slf4jLogger started
> >> > 15/02/16 05:58:45 INFO Remoting: Starting remoting
> >> > 15/02/16 05:58:45 INFO Remoting: Remoting started; listening on
> >> > addresses
> >> > :[akka.tcp://sparkDriver@devzero:38917]
> >> > 15/02/16 05:58:45 INFO Utils: Successfully started service
> 'sparkDriver'
> >> > on
> >> > port 38917.
> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering MapOutputTracker
> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering BlockManagerMaster
> >> > 15/02/16 05:58:45 INFO DiskBlockManager: Created local directory at
> >> >
> >> >
> /tmp/spark-6cdca68b-edec-4a31-b3c1-a7e9d60191e7/spark-0e977468-6e31-4bba-959a-135d9ebda193
> >> > 15/02/16 05:58:45 INFO MemoryStore: MemoryStore started with capacity
> >> > 265.4
> >> > MB
> >> > 15/02/16 05:58:45 WARN NativeCodeLoader: Unable to load native-hadoop
> >> > library for your platform... using builtin-java classes where
> applicable
> >> > 15/02/16 05:58:46 INFO HttpFileServer: HTTP File server directory is
> >> >
> >> >
> /tmp/spark-af61f7f5-7c0e-412c-8352-263338335fa5/spark-10b3891f-0321-44fe-ba60-1a8c102fd647
> >> > 15/02/16 05:58:46 INFO HttpServer: Starting HTTP Server
> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'HTTP file
> >> > server' on port 56642.
> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'SparkUI'
> on
> >> > port
> >> > 4040.
> >> > 15/02/16 05:58:46 INFO SparkUI: Started SparkUI at
> http://devzero:4040
> >> > 15/02/16 05:58:46 INFO SparkContext: Added JAR
> >> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at
> >> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar
> with
> >> > timestamp 1424066326632
> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/test2.py to
> >> >
> >> >
> /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/test2.py
> >> > 15/02/16 05:58:46 INFO SparkContext: Added file file:/spark/test2.py
> at
> >> > http://10.212.55.42:56642/files/test2.py with timestamp 1424066326633
> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/pyspark_cassandra.py to
> >> >
> >> >
> /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/pyspark_cassandra.py
> >> > 15/02/16 05:58:46 INFO SparkContext: Added file
> >> > file:/spark/pyspark_cassandra.py at
> >> > http://10.212.55.42:56642/files/pyspark_cassandra.py with timestamp
> >> > 1424066326642
> >> > 15/02/16 05:58:46 INFO Executor: Starting executor ID <driver> on
host
> >> > localhost
> >> > 15/02/16 05:58:46 INFO AkkaUtils: Connecting to HeartbeatReceiver:
> >> > akka.tcp://sparkDriver@devzero:38917/user/HeartbeatReceiver
> >> > 15/02/16 05:58:46 INFO NettyBlockTransferService: Server created on
> >> > 32895
> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Trying to register
> >> > BlockManager
> >> > 15/02/16 05:58:46 INFO BlockManagerMasterActor: Registering block
> >> > manager
> >> > localhost:32895 with 265.4 MB RAM, BlockManagerId(<driver>, localhost,
> >> > 32895)
> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Registered BlockManager
> >> > 15/02/16 05:58:47 INFO SparkUI: Stopped Spark web UI at
> >> > http://devzero:4040
> >> > 15/02/16 05:58:47 INFO DAGScheduler: Stopping DAGScheduler
> >> > 15/02/16 05:58:48 INFO MapOutputTrackerMasterActor:
> >> > MapOutputTrackerActor
> >> > stopped!
> >> > 15/02/16 05:58:48 INFO MemoryStore: MemoryStore cleared
> >> > 15/02/16 05:58:48 INFO BlockManager: BlockManager stopped
> >> > 15/02/16 05:58:48 INFO BlockManagerMaster: BlockManagerMaster stopped
> >> > 15/02/16 05:58:48 INFO SparkContext: Successfully stopped SparkContext
> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator:
> >> > Shutting
> >> > down remote daemon.
> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator:
> Remote
> >> > daemon shut down; proceeding with flushing remote transports.
> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator:
> >> > Remoting
> >> > shut down.
> >> > Traceback (most recent call last):
> >> >   File "/spark/test2.py", line 5, in <module>
> >> >     sc = CassandraSparkContext(conf=conf)
> >> >   File "/spark/python/pyspark/context.py", line 105, in __init__
> >> >     conf, jsc)
> >> >   File "/spark/pyspark_cassandra.py", line 17, in _do_init
> >> >     self._jcsc = self._jvm.CassandraJavaUtil.javaFunctions(self._jsc)
> >> >   File "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> >> > line
> >> > 726, in __getattr__
> >> > py4j.protocol.Py4JError: Trying to call a package.
> >> >
> >> >
> >
> >
>

Mime
View raw message