spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harut Martirosyan <harut.martiros...@gmail.com>
Subject Re: Parallel actions from driver
Date Fri, 27 Mar 2015 08:21:47 GMT
This is exactly my case also, it worked, thanks Sean.

On 26 March 2015 at 23:35, Sean Owen <sowen@cloudera.com> wrote:

> You can do this much more simply, I think, with Scala's parallel
> collections (try .par). There's nothing wrong with doing this, no.
>
> Here, something is getting caught in your closure, maybe
> unintentionally, that's not serializable. It's not directly related to
> the parallelism.
>
> On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan
> <aram.mkrtchyan.87@gmail.com> wrote:
> > Hi.
> >
> > I'm trying to trigger DataFrame's save method in parallel from my driver.
> > For that purposes I use ExecutorService and Futures, here's my code:
> >
> >
> > val futures = [1,2,3].map( t => pool.submit( new Runnable {
> >
> > override def run(): Unit = {
> >     val commons = events.filter(_._1 == t).map(_._2.common)
> >     saveAsParquetFile(sqlContext, commons, s"$t/common")
> >     EventTypes.all.foreach { et =>
> >         val eventData = events.filter(ev => ev._1 == t &&
> ev._2.eventType ==
> > et).map(_._2.data)
> >         saveAsParquetFile(sqlContext, eventData, s"$t/$et")
> >     }
> > }
> >
> > }))
> > futures.foreach(_.get)
> >
> > It throws "Task is not Serializable" exception. Is it legal to use
> threads
> > in driver to trigger actions?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
RGRDZ Harut

Mime
View raw message