spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Imran Rashid (JIRA)" <>
Subject [jira] [Resolved] (SPARK-26329) ExecutorMetrics should poll faster than heartbeats
Date Thu, 01 Aug 2019 14:11:00 GMT


Imran Rashid resolved SPARK-26329.
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 23767

> ExecutorMetrics should poll faster than heartbeats
> --------------------------------------------------
>                 Key: SPARK-26329
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, Web UI
>    Affects Versions: 3.0.0
>            Reporter: Imran Rashid
>            Assignee: Wing Yew Poon
>            Priority: Major
>             Fix For: 3.0.0
> We should allow faster polling of the executor memory metrics (SPARK-23429 / SPARK-23206)
without requiring a faster heartbeat rate.  We've seen the memory usage of executors pike
over 1 GB in less than a second, but heartbeats are only every 10 seconds (by default).  Spark
needs to enable fast polling to capture these peaks, without causing too much strain on the
> In the current implementation, the metrics are polled along with the heartbeat, but this
leads to a slow rate of polling metrics by default.  If users were to increase the rate of
the heartbeat, they risk overloading the driver on a large cluster, with too many messages
and too much work to aggregate the metrics.  But, the executor could poll the metrics more
frequently, and still only send the *max* since the last heartbeat for each metric.  This
keeps the load on the driver the same, and only introduces a small overhead on the executor
to grab the metrics and keep the max.
> The downside of this approach is that we still need to wait for the next heartbeat for
the driver to be aware of the new peak.   If the executor dies or is killed before then, then
we won't find out.  A potential future enhancement would be to send an update *anytime* there
is an increase by some percentage, but we'll leave that out for now.
> Another possibility would be to change the metrics themselves to track peaks for us,
so we don't have to fine-tune the polling rate.  For example, some jvm metrics provide a usage
threshold, and notification:
> But, that is not available on all metrics.  This proposal gives us a generic way to get
a more accurate peak memory usage for *all* metrics.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message