spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kohki Nishio <tarop...@gmail.com>
Subject Re: JavaSerializerInstance is slow
Date Wed, 08 Sep 2021 04:31:55 GMT
A spark job creates 200 partitions, and executors try to deserialize
the task at the same time. That creates a chain of blocking situations, as
all executors are deserializing the same task and loadClass does a lock per
class name. I often observe that many threads are making that chain from
the thread dumps.

We're using Spark as a high TPS search engine, we can't really afford
allocating a resource per query, thus, we're going with local mode, I
believe there are people using similar way in production, but anyways,
thanks for the comments. For now, it seems Java deserializer is the only
option, ... so it seems I'll have to add more machines to handle higher TPS,

Thanks
-Kohki

On Fri, Sep 3, 2021 at 5:40 AM Sean Owen <srowen@gmail.com> wrote:

> I don't know if java serialization is slow in that case; that shows
> blocking on a class load, which may or may not be directly due to
> deserialization.
> Indeed I don't think (some) things are serialized in local mode within one
> JVM, so not sure that's actually what's going on.
>
> On Thu, Sep 2, 2021 at 11:58 PM Antonin Delpeuch (lists) <
> lists@antonin.delpeuch.eu> wrote:
>
>> Hi Kohki,
>>
>> Serialization of tasks happens in local mode too and as far as I am
>> aware there is no way to disable this (although it would definitely be
>> useful in my opinion).
>>
>> You can see the local mode as a testing mode, in which you would want to
>> catch any serialization errors, before they appear in production.
>>
>> There are also some important bugs that are present in local mode and
>> are not deemed worth fixing because it is not intended to be used in
>> production (https://issues.apache.org/jira/browse/SPARK-5300).
>>
>> I think there would definitely be interest in having a reliable and
>> efficient local mode in Spark but it's a pretty different use case than
>> what Spark originally focused on.
>>
>> Antonin
>>
>> On 03/09/2021 05:56, Kohki Nishio wrote:
>> > I'm seeing many threads doing deserialization of a task, I understand
>> > since lambda is involved, we can't use Kryo for those purposes.
>> > However I'm running it in local mode, this serialization is not really
>> > necessary, no?
>> >
>> > Is there any trick I can apply to get rid of this thread contention ?
>> > I'm seeing many of the below threads in thread dumps ...
>> >
>> >
>> > "Executor task launch worker for task 11.0 in stage 15472514.0 (TID
>> > 19788863)" #732821 daemon prio=5 os_prio=0 tid=0x00007f02581b2800
>> > nid=0x355d waiting for monitor entry [0x00007effd1e3f000]
>> >    java.lang.Thread.State: BLOCKED (on object monitor)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:400)
>> > - waiting to lock <0x00007f0f7246edf8> (a java.lang.Object)
>> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>> > at
>> >
>> scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:51)
>> > at
>> >
>> scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)
>> >
>> >
>> > Thanks
>> > -Kohki
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

-- 
Kohki Nishio

Mime
View raw message