spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kohki Nishio <tarop...@gmail.com>
Subject Re: JavaSerializerInstance is slow
Date Fri, 10 Sep 2021 23:16:29 GMT
snippet from one thread dumps, this is how i mean it's slow, always many
threads are waiting

On Tue, Sep 7, 2021 at 9:31 PM Kohki Nishio <taroplus@gmail.com> wrote:

> A spark job creates 200 partitions, and executors try to deserialize
> the task at the same time. That creates a chain of blocking situations, as
> all executors are deserializing the same task and loadClass does a lock per
> class name. I often observe that many threads are making that chain from
> the thread dumps.
>
> We're using Spark as a high TPS search engine, we can't really afford
> allocating a resource per query, thus, we're going with local mode, I
> believe there are people using similar way in production, but anyways,
> thanks for the comments. For now, it seems Java deserializer is the only
> option, ... so it seems I'll have to add more machines to handle higher TPS,
>
> Thanks
> -Kohki
>
> On Fri, Sep 3, 2021 at 5:40 AM Sean Owen <srowen@gmail.com> wrote:
>
>> I don't know if java serialization is slow in that case; that shows
>> blocking on a class load, which may or may not be directly due to
>> deserialization.
>> Indeed I don't think (some) things are serialized in local mode within
>> one JVM, so not sure that's actually what's going on.
>>
>> On Thu, Sep 2, 2021 at 11:58 PM Antonin Delpeuch (lists) <
>> lists@antonin.delpeuch.eu> wrote:
>>
>>> Hi Kohki,
>>>
>>> Serialization of tasks happens in local mode too and as far as I am
>>> aware there is no way to disable this (although it would definitely be
>>> useful in my opinion).
>>>
>>> You can see the local mode as a testing mode, in which you would want to
>>> catch any serialization errors, before they appear in production.
>>>
>>> There are also some important bugs that are present in local mode and
>>> are not deemed worth fixing because it is not intended to be used in
>>> production (https://issues.apache.org/jira/browse/SPARK-5300).
>>>
>>> I think there would definitely be interest in having a reliable and
>>> efficient local mode in Spark but it's a pretty different use case than
>>> what Spark originally focused on.
>>>
>>> Antonin
>>>
>>> On 03/09/2021 05:56, Kohki Nishio wrote:
>>> > I'm seeing many threads doing deserialization of a task, I understand
>>> > since lambda is involved, we can't use Kryo for those purposes.
>>> > However I'm running it in local mode, this serialization is not really
>>> > necessary, no?
>>> >
>>> > Is there any trick I can apply to get rid of this thread contention ?
>>> > I'm seeing many of the below threads in thread dumps ...
>>> >
>>> >
>>> > "Executor task launch worker for task 11.0 in stage 15472514.0 (TID
>>> > 19788863)" #732821 daemon prio=5 os_prio=0 tid=0x00007f02581b2800
>>> > nid=0x355d waiting for monitor entry [0x00007effd1e3f000]
>>> >    java.lang.Thread.State: BLOCKED (on object monitor)
>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:400)
>>> > - waiting to lock <0x00007f0f7246edf8> (a java.lang.Object)
>>> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>> > at
>>> >
>>> scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:51)
>>> > at
>>> >
>>> scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)
>>> >
>>> >
>>> > Thanks
>>> > -Kohki
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>
> --
> Kohki Nishio
>


-- 
Kohki Nishio

Mime
View raw message