spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aram Mkrtchyan <aram.mkrtchyan...@gmail.com>
Subject Re: Parallel actions from driver
Date Fri, 27 Mar 2015 08:23:26 GMT
Thanks Sean,

It works with Scala's parallel collections.

On Thu, Mar 26, 2015 at 11:35 PM, Sean Owen <sowen@cloudera.com> wrote:

> You can do this much more simply, I think, with Scala's parallel
> collections (try .par). There's nothing wrong with doing this, no.
>
> Here, something is getting caught in your closure, maybe
> unintentionally, that's not serializable. It's not directly related to
> the parallelism.
>
> On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan
> <aram.mkrtchyan.87@gmail.com> wrote:
> > Hi.
> >
> > I'm trying to trigger DataFrame's save method in parallel from my driver.
> > For that purposes I use ExecutorService and Futures, here's my code:
> >
> >
> > val futures = [1,2,3].map( t => pool.submit( new Runnable {
> >
> > override def run(): Unit = {
> >     val commons = events.filter(_._1 == t).map(_._2.common)
> >     saveAsParquetFile(sqlContext, commons, s"$t/common")
> >     EventTypes.all.foreach { et =>
> >         val eventData = events.filter(ev => ev._1 == t &&
> ev._2.eventType ==
> > et).map(_._2.data)
> >         saveAsParquetFile(sqlContext, eventData, s"$t/$et")
> >     }
> > }
> >
> > }))
> > futures.foreach(_.get)
> >
> > It throws "Task is not Serializable" exception. Is it legal to use
> threads
> > in driver to trigger actions?
>

Mime
View raw message