spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cassa L <lcas...@gmail.com>
Subject Why don't I see my spark jobs running in parallel in Cassandra/Spark DSE cluster?
Date Fri, 27 Oct 2017 06:05:25 GMT
Hi,
I have a spark job that has use case as below:
RRD1 and RDD2 read from Cassandra tables. These two RDDs then do some
transformation and after that I do a count on transformed data.

Code somewhat  looks like this:

RDD1=JavaFunctions.cassandraTable(...)
RDD2=JavaFunctions.cassandraTable(...)
RDD3 = RDD1.flatMap(..)
RDD4 = RDD2.flatMap()

RDD3.count
RDD4.count

In Spark UI I see count() functions are getting called one after another.
How do I make it parallel? I also looked at below discussion from Cloudera,
but it does not show how to run driver functions in parallel. Do I just add
Executor and run them in threads?

https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Getting-Spark-stages-to-run-in-parallel-inside-an-application/td-p/38515

[image: Inline image 1]Attaching UI snapshot here?


Thanks.
LCassa

Mime
View raw message