spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <>
Subject Re: Task result is serialized twice by serializer and closure serializer
Date Thu, 05 Mar 2015 01:07:13 GMT
Hey Mingyu,

I think it's broken out separately so we can record the time taken to
serialize the result. Once we serializing it once, the second
serialization should be really simple since it's just wrapping
something that has already been turned into a byte buffer. Do you see
a specific issue with serializing it twice?

I think you need to have two steps if you want to record the time
taken to serialize the result, since that needs to be sent back to the
driver when the task completes.

- Patrick

On Wed, Mar 4, 2015 at 4:01 PM, Mingyu Kim <> wrote:
> Hi all,
> It looks like the result of task is serialized twice, once by serializer (I.e. Java/Kryo
depending on configuration) and once again by closure serializer (I.e. Java). To link the
actual code,
> The first one:
> The second one:
> This serializes the "value", which is the result of task run twice, which affects things
like collect(), takeSample(), and toLocalIterator(). Would it make sense to simply serialize
the DirectTaskResult once using the regular "serializer" (as opposed to closure serializer)?
Would it cause problems when the Accumulator values are not Kryo-serializable?
> Alternatively, if we can assume that Accumator values are small, we can closure-serialize
those, put the serialized byte array in DirectTaskResult with the raw task result "value",
and serialize DirectTaskResult.
> What do people think?
> Thanks,
> Mingyu

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message