spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran Rashid <iras...@cloudera.com>
Subject Re: Fair scheduler pool leak
Date Fri, 06 Apr 2018 15:08:37 GMT
Hi Matthias,

This doeesn't look possible now.  It may be worth filing an improvement
jira for.

But I'm trying to understand what you're trying to do a little better.  So
you intentionally have each thread create a new unique pool when its
submits a job?  So that pool will just get the default pool configuration,
and you will see lots of these messages in your logs?

https://github.com/apache/spark/blob/6ade5cbb498f6c6ea38779b97f2325d5cf5013f2/core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala#L196-L200

What is the use case for creating pools this way?

Also if I understand correctly, it doesn't even matter if the thread dies
-- that pool will still stay around, as the rootPool will retain a
reference to its (the pools aren't really actually tied to specific
threads).

Imran

On Thu, Apr 5, 2018 at 9:46 PM, Matthias Boehm <mboehm7@gmail.com> wrote:

> Hi all,
>
> for concurrent Spark jobs spawned from the driver, we use Spark's fair
> scheduler pools, which are set and unset in a thread-local manner by
> each worker thread. Typically (for rather long jobs), this works very
> well. Unfortunately, in an application with lots of very short
> parallel sections, we see 1000s of these pools remaining in the Spark
> UI, which indicates some kind of leak. Each worker cleans up its local
> property by setting it to null, but not all pools are properly
> removed. I've checked and reproduced this behavior with Spark 2.1-2.3.
>
> Now my question: Is there a way to explicitly remove these pools,
> either globally, or locally while the thread is still alive?
>
> Regards,
> Matthias
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message