Awesome, it actually seems to work. Amazing how simple it can be sometimes...

Thanks Sean!

You can parallelize on the driver side. The way to do it is almost
exactly what you have here, where you're iterating over a local Scala
collection of dates and invoking a Spark operation for each. Simply
write "" to make the local map proceed in
parallel. It should invoke the Spark jobs simultaneously.

> Hey,
> Lets say we have multiple independent jobs that each transform some data and
> store in distinct hdfs locations, is there a nice way to run them in
> parallel? See the following pseudo code snippet:
> =>
> sc.hdfsFile(date).map(transform).saveAsHadoopFile(date))
> It's unfortunate if they run in sequence, since all the executors are not
> used efficiently. What's the best way to parallelize execution of these
> jobs?
> Thanks,
> Anders