spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: Problem getting pyspark-cassandra and pyspark working
Date Tue, 17 Feb 2015 00:20:04 GMT
Can you try the example in pyspark-cassandra?

If not, you could create a issue there.

On Mon, Feb 16, 2015 at 4:07 PM, Mohamed Lrhazi
<Mohamed.Lrhazi@georgetown.edu> wrote:
> So I tired building the connector from:
> https://github.com/datastax/spark-cassandra-connector
>
> which seems to include the java class referenced in the error message:
>
> [root@devzero spark]# unzip -l
> spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar
> |grep CassandraJavaUtil
>
>     14612  02-16-2015 23:25
> com/datastax/spark/connector/japi/CassandraJavaUtil.class
>
> [root@devzero spark]#
>
>
> When I try running my spark test job, I still get the exact same error, even
> though both my jars seems to have been processed by spark.
>
>
> ...
> 15/02/17 00:00:45 INFO SparkUI: Started SparkUI at http://devzero:4040
> 15/02/17 00:00:45 INFO SparkContext: Added JAR
> file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at
> http://10.212.55.42:36929/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with
> timestamp 1424131245595
> 15/02/17 00:00:45 INFO SparkContext: Added JAR
> file:/spark/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar at
> http://10.212.55.42:36929/jars/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar
> with timestamp 1424131245623
> 15/02/17 00:00:45 INFO Utils: Copying /spark/test2.py to
> /tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/test2.py
> 15/02/17 00:00:45 INFO SparkContext: Added file file:/spark/test2.py at
> http://10.212.55.42:36929/files/test2.py with timestamp 1424131245624
> 15/02/17 00:00:45 INFO Utils: Copying /spark/pyspark_cassandra.py to
> /tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/pyspark_cassandra.py
> 15/02/17 00:00:45 INFO SparkContext: Added file
> file:/spark/pyspark_cassandra.py at
> http://10.212.55.42:36929/files/pyspark_cassandra.py with timestamp
> 1424131245633
> 15/02/17 00:00:45 INFO Executor: Starting executor ID <driver> on host
> localhost
> 15/
> ....
> 15/02/17 00:00:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting
> shut down.
> Traceback (most recent call last):
>   File "/spark/test2.py", line 5, in <module>
>     sc = CassandraSparkContext(conf=conf)
>   File "/spark/python/pyspark/context.py", line 105, in __init__
>     conf, jsc)
>   File "/spark/pyspark_cassandra.py", line 17, in _do_init
>     self._jcsc = self._jvm.CassandraJavaUtil.javaFunctions(self._jsc)
>   File "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line
> 726, in __getattr__
> py4j.protocol.Py4JError: Trying to call a package.
>
>
> am I building the wrong connector jar? or using the wrong jar?
>
> Thanks a lot,
> Mohamed.
>
>
>
> On Mon, Feb 16, 2015 at 5:46 PM, Mohamed Lrhazi
> <Mohamed.Lrhazi@georgetown.edu> wrote:
>>
>> Oh, I don't know. thanks a lot Davies, gonna figure that out now....
>>
>> On Mon, Feb 16, 2015 at 5:31 PM, Davies Liu <davies@databricks.com> wrote:
>>>
>>> It also need the Cassandra jar:
>>> com.datastax.spark.connector.CassandraJavaUtil
>>>
>>> Is it included in  /spark/pyspark-cassandra-0.1-SNAPSHOT.jar ?
>>>
>>>
>>>
>>> On Mon, Feb 16, 2015 at 1:20 PM, Mohamed Lrhazi
>>> <Mohamed.Lrhazi@georgetown.edu> wrote:
>>> > Yes, am sure the system cant find the jar.. but how do I fix that... my
>>> > submit command includes the jar:
>>> >
>>> > /spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars
>>> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path
>>> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py
>>> >
>>> > and the spark output seems to indicate it is handling it:
>>> >
>>> > 15/02/16 05:58:46 INFO SparkContext: Added JAR
>>> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at
>>> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with
>>> > timestamp 1424066326632
>>> >
>>> >
>>> > I don't really know what else I could try.... any suggestions highly
>>> > appreciated.
>>> >
>>> > Thanks,
>>> > Mohamed.
>>> >
>>> >
>>> > On Mon, Feb 16, 2015 at 4:04 PM, Davies Liu <davies@databricks.com>
>>> > wrote:
>>> >>
>>> >> It seems that the jar for cassandra is not loaded, you should have
>>> >> them in the classpath.
>>> >>
>>> >> On Mon, Feb 16, 2015 at 12:08 PM, Mohamed Lrhazi
>>> >> <Mohamed.Lrhazi@georgetown.edu> wrote:
>>> >> > Hello all,
>>> >> >
>>> >> > Trying the example code from this package
>>> >> > (https://github.com/Parsely/pyspark-cassandra) , I always get this
>>> >> > error...
>>> >> >
>>> >> > Can you see what I am doing wrong? from googling arounf it seems
to
>>> >> > be
>>> >> > that
>>> >> > the jar is not found somehow...  The spark log shows the JAR was
>>> >> > processed
>>> >> > at least.
>>> >> >
>>> >> > Thank you so much.
>>> >> >
>>> >> > am using spark-1.2.1-bin-hadoop2.4.tgz
>>> >> >
>>> >> > test2.py is simply:
>>> >> >
>>> >> > from pyspark.context import SparkConf
>>> >> > from pyspark_cassandra import CassandraSparkContext, saveToCassandra
>>> >> > conf = SparkConf().setAppName("PySpark Cassandra Sample Driver")
>>> >> > conf.set("spark.cassandra.connection.host", "devzero")
>>> >> > sc = CassandraSparkContext(conf=conf)
>>> >> >
>>> >> > [root@devzero spark]# /usr/local/bin/docker-enter  spark-master
bash
>>> >> > -c
>>> >> > "/spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py
>>> >> > --jars
>>> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path
>>> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py"
>>> >> > ...
>>> >> > 15/02/16 05:58:45 INFO Slf4jLogger: Slf4jLogger started
>>> >> > 15/02/16 05:58:45 INFO Remoting: Starting remoting
>>> >> > 15/02/16 05:58:45 INFO Remoting: Remoting started; listening on
>>> >> > addresses
>>> >> > :[akka.tcp://sparkDriver@devzero:38917]
>>> >> > 15/02/16 05:58:45 INFO Utils: Successfully started service
>>> >> > 'sparkDriver'
>>> >> > on
>>> >> > port 38917.
>>> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering MapOutputTracker
>>> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering BlockManagerMaster
>>> >> > 15/02/16 05:58:45 INFO DiskBlockManager: Created local directory
at
>>> >> >
>>> >> >
>>> >> > /tmp/spark-6cdca68b-edec-4a31-b3c1-a7e9d60191e7/spark-0e977468-6e31-4bba-959a-135d9ebda193
>>> >> > 15/02/16 05:58:45 INFO MemoryStore: MemoryStore started with
>>> >> > capacity
>>> >> > 265.4
>>> >> > MB
>>> >> > 15/02/16 05:58:45 WARN NativeCodeLoader: Unable to load
>>> >> > native-hadoop
>>> >> > library for your platform... using builtin-java classes where
>>> >> > applicable
>>> >> > 15/02/16 05:58:46 INFO HttpFileServer: HTTP File server directory
is
>>> >> >
>>> >> >
>>> >> > /tmp/spark-af61f7f5-7c0e-412c-8352-263338335fa5/spark-10b3891f-0321-44fe-ba60-1a8c102fd647
>>> >> > 15/02/16 05:58:46 INFO HttpServer: Starting HTTP Server
>>> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'HTTP
>>> >> > file
>>> >> > server' on port 56642.
>>> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'SparkUI'
>>> >> > on
>>> >> > port
>>> >> > 4040.
>>> >> > 15/02/16 05:58:46 INFO SparkUI: Started SparkUI at
>>> >> > http://devzero:4040
>>> >> > 15/02/16 05:58:46 INFO SparkContext: Added JAR
>>> >> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at
>>> >> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar
>>> >> > with
>>> >> > timestamp 1424066326632
>>> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/test2.py to
>>> >> >
>>> >> >
>>> >> > /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/test2.py
>>> >> > 15/02/16 05:58:46 INFO SparkContext: Added file file:/spark/test2.py
>>> >> > at
>>> >> > http://10.212.55.42:56642/files/test2.py with timestamp
>>> >> > 1424066326633
>>> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/pyspark_cassandra.py
to
>>> >> >
>>> >> >
>>> >> > /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/pyspark_cassandra.py
>>> >> > 15/02/16 05:58:46 INFO SparkContext: Added file
>>> >> > file:/spark/pyspark_cassandra.py at
>>> >> > http://10.212.55.42:56642/files/pyspark_cassandra.py with timestamp
>>> >> > 1424066326642
>>> >> > 15/02/16 05:58:46 INFO Executor: Starting executor ID <driver>
on
>>> >> > host
>>> >> > localhost
>>> >> > 15/02/16 05:58:46 INFO AkkaUtils: Connecting to HeartbeatReceiver:
>>> >> > akka.tcp://sparkDriver@devzero:38917/user/HeartbeatReceiver
>>> >> > 15/02/16 05:58:46 INFO NettyBlockTransferService: Server created
on
>>> >> > 32895
>>> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Trying to register
>>> >> > BlockManager
>>> >> > 15/02/16 05:58:46 INFO BlockManagerMasterActor: Registering block
>>> >> > manager
>>> >> > localhost:32895 with 265.4 MB RAM, BlockManagerId(<driver>,
>>> >> > localhost,
>>> >> > 32895)
>>> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Registered BlockManager
>>> >> > 15/02/16 05:58:47 INFO SparkUI: Stopped Spark web UI at
>>> >> > http://devzero:4040
>>> >> > 15/02/16 05:58:47 INFO DAGScheduler: Stopping DAGScheduler
>>> >> > 15/02/16 05:58:48 INFO MapOutputTrackerMasterActor:
>>> >> > MapOutputTrackerActor
>>> >> > stopped!
>>> >> > 15/02/16 05:58:48 INFO MemoryStore: MemoryStore cleared
>>> >> > 15/02/16 05:58:48 INFO BlockManager: BlockManager stopped
>>> >> > 15/02/16 05:58:48 INFO BlockManagerMaster: BlockManagerMaster
>>> >> > stopped
>>> >> > 15/02/16 05:58:48 INFO SparkContext: Successfully stopped
>>> >> > SparkContext
>>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator:
>>> >> > Shutting
>>> >> > down remote daemon.
>>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator:
>>> >> > Remote
>>> >> > daemon shut down; proceeding with flushing remote transports.
>>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator:
>>> >> > Remoting
>>> >> > shut down.
>>> >> > Traceback (most recent call last):
>>> >> >   File "/spark/test2.py", line 5, in <module>
>>> >> >     sc = CassandraSparkContext(conf=conf)
>>> >> >   File "/spark/python/pyspark/context.py", line 105, in __init__
>>> >> >     conf, jsc)
>>> >> >   File "/spark/pyspark_cassandra.py", line 17, in _do_init
>>> >> >     self._jcsc =
>>> >> > self._jvm.CassandraJavaUtil.javaFunctions(self._jsc)
>>> >> >   File
>>> >> > "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>>> >> > line
>>> >> > 726, in __getattr__
>>> >> > py4j.protocol.Py4JError: Trying to call a package.
>>> >> >
>>> >> >
>>> >
>>> >
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message