spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Parallel actions from driver
Date Thu, 26 Mar 2015 19:35:09 GMT
You can do this much more simply, I think, with Scala's parallel
collections (try .par). There's nothing wrong with doing this, no.

Here, something is getting caught in your closure, maybe
unintentionally, that's not serializable. It's not directly related to
the parallelism.

On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan
<aram.mkrtchyan.87@gmail.com> wrote:
> Hi.
>
> I'm trying to trigger DataFrame's save method in parallel from my driver.
> For that purposes I use ExecutorService and Futures, here's my code:
>
>
> val futures = [1,2,3].map( t => pool.submit( new Runnable {
>
> override def run(): Unit = {
>     val commons = events.filter(_._1 == t).map(_._2.common)
>     saveAsParquetFile(sqlContext, commons, s"$t/common")
>     EventTypes.all.foreach { et =>
>         val eventData = events.filter(ev => ev._1 == t && ev._2.eventType
==
> et).map(_._2.data)
>         saveAsParquetFile(sqlContext, eventData, s"$t/$et")
>     }
> }
>
> }))
> futures.foreach(_.get)
>
> It throws "Task is not Serializable" exception. Is it legal to use threads
> in driver to trigger actions?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message