We are using spark streaming + cassandra to compute concurrent users every 5min. Our batch size is 10secs and our block interval is 2.5secs.
At the end of the world we are using foreachRDD to join the data in the RDD with existing data in Cassandra, update the counters and then save it back to Cassandra.
To the best of my understanding, in this scenario, spark streaming produces one RDD every 10secs and foreachRDD executes them sequentially, that is, foreachRDD would never run in parallel.
Am I right?