I have a spark job that has use case as below:
RRD1 and RDD2 read from Cassandra tables. These two RDDs then do some transformation and after that I do a count on transformed data.
Code somewhat looks like this:
RDD3 = RDD1.flatMap(..)
RDD4 = RDD2.flatMap()
In Spark UI I see count() functions are getting called one after another. How do I make it parallel? I also looked at below discussion from Cloudera, but it does not show how to run driver functions in parallel. Do I just add Executor and run them in threads?
Attaching UI snapshot here?