spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Albert <m_albert...@yahoo.com.INVALID>
Subject are functions deserialized once per task?
Date Fri, 02 Oct 2015 16:33:21 GMT
Greetings!
Is it true that functions, such as those passed to RDD.map(), are deserialized once per task?This
seems to be the case looking at Executor.scala, but I don't really understand the code.
I'm hoping the answer is yes because that makes it easier to write code without worrying about
thread safety.For example, suppose I have something like this:class FooToBarTransformer{ 
 def transform(foo: Foo): Bar = .....}
Now I want to do something like this:val rddFoo : RDD[FOO] = ....val transformer = new TransformerrddBar
= rddFoo.map( foo => transformer.transform(foo))
If the "transformer" object is deserialized once per task, then I do not need to worry whether
"transform()" is thread safe.If, for example, the implementation tried "optimize" matters
by caching the deserialization, so that one object was sharedby all threads in a single JVM,
then presumably one would need to worry about the thread safety of transform().
Is my understanding correct?Is this likely to continue to be true in future releases?Answers
of "yes" would be much appreciated :-).
Thanks!-Mike


Mime
View raw message