spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Are Task Closures guaranteed to be accessed by only one Thread?
Date Wed, 05 Oct 2016 17:12:30 GMT
I don't think this is guaranteed and don't think I'd rely on it. Ideally
your functions here aren't even stateful, because they could be
reinstantiated and/or re-executed many times due to, say, failures. Not
being stateful dodges a lot of thread-safety issues. If you're doing this
because you have some expensive shared resource, and you're mapping,
consider mapPartitions, and setting up the resource at the start.

On Wed, Oct 5, 2016 at 5:23 PM Matthew Dailey <matthew.dailey1@gmail.com>
wrote:

> Looking at the programming guide
> <http://spark.apache.org/docs/1.6.1/programming-guide.html#local-vs-cluster-modes>
> for Spark 1.6.1, it states
> > Prior to execution, Spark computes the task’s closure. The closure is
> those variables and methods which must be visible for the executor to
> perform its computations on the RDD
> > The variables within the closure sent to each executor are now copies
>
> So my question is, will an executor access a single copy of the closure
> with more than one thread?  I ask because I want to know if I can ignore
> thread-safety in a function I write.  Take a look at this gist as a
> simplified example with a thread-unsafe operation being passed to map():
> https://gist.github.com/matthew-dailey/4e1ab0aac580151dcfd7fbe6beab84dc
>
> This is for Spark Streaming, but I suspect the answer is the same between
> batch and streaming.
>
> Thanks for any help,
> Matt
>

Mime
View raw message