spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Queue independent jobs
Date Fri, 09 Jan 2015 11:42:15 GMT
You can parallelize on the driver side. The way to do it is almost
exactly what you have here, where you're iterating over a local Scala
collection of dates and invoking a Spark operation for each. Simply
write "" to make the local map proceed in
parallel. It should invoke the Spark jobs simultaneously.

On Fri, Jan 9, 2015 at 10:46 AM, Anders Arpteg <> wrote:
> Hey,
> Lets say we have multiple independent jobs that each transform some data and
> store in distinct hdfs locations, is there a nice way to run them in
> parallel? See the following pseudo code snippet:
> =>
> sc.hdfsFile(date).map(transform).saveAsHadoopFile(date))
> It's unfortunate if they run in sequence, since all the executors are not
> used efficiently. What's the best way to parallelize execution of these
> jobs?
> Thanks,
> Anders

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message