spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabh...@gmail.com>
Subject Re: Spark Structured Streaming resource contention / memory issue
Date Fri, 12 Oct 2018 12:57:43 GMT
Hi Patrick,

Looks like you might be struggling with state memory, which multiple issues
are going to be resolved in Spark 2.4.

1. SPARK-24441 [1]: Expose total estimated size of states in
HDFSBackedStateStoreProvider
2. SPARK-24637 [2]: Add metrics regarding state and watermark to dropwizard
metrics
3. SPARK-24717 [3]: Split out min retain version of state for memory in
HDFSBackedStateStoreProvider

There're other patches relevant to state store as well, but above issues
are applied to map/flatmapGroupsWithState.

Since Spark community is in progress on releasing Spark 2.4.0, could you
try experimenting Spark 2.4.0 RC if you really don't mind? You could try
out applying individual patches and see whether it helps.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/SPARK-24441
2. https://issues.apache.org/jira/browse/SPARK-24637
3. https://issues.apache.org/jira/browse/SPARK-24717


2018년 10월 12일 (금) 오후 9:31, Patrick McGloin <mcgloin.patrick@gmail.com>님이
작성:

> Hi allI sent this earlier but the screenshots were not attached. Hopefully
> this time it is correct.
>
> We have a Spark Structured streaming stream which is using
> mapGroupWithState. After some time of processing in a stable manner
> suddenly each mini batch starts taking 40 seconds. Suspiciously it looks
> like exactly 40 seconds each time. Before this the batches were taking less
> than a second.
>
>
> Looking at the details for a particular task most partitions are processed
> really quickly but a few take exactly 40 seconds:
>
>
>
>
> The GC was looking ok as the data was being processed quickly but suddenly
> the full GCs etc stop (at the same time as the 40 second issue):
>
>
>
> I have taken a thread dump from one of the executors as this issue is
> happening but I cannot see any resource they are blocked on:
>
>
>
>
> Are we hitting a GC problem and why is it manifesting in this way? Is
> there another resource that is blocking and what is it?
>
>
> Thanks,
> Patrick
>
>
>
> This message has been sent by ABN AMRO Bank N.V., which has its seat at Gustav
> Mahlerlaan 10 (1082 PP) Amsterdam, the Netherlands
> <https://maps.google.com/?q=Gustav+Mahlerlaan+10+(1082+PP)+Amsterdam,+the+Netherlands&entry=gmail&source=g>,
> and is registered in the Commercial Register of Amsterdam under number
> 34334259.
>

Mime
View raw message