spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <>
Subject Re: CPU/Disk/network performance instrumentation
Date Wed, 09 Jul 2014 19:25:16 GMT
Maybe it's time to create an advanced mode in the ui.

On Wed, Jul 9, 2014 at 12:23 PM, Kay Ousterhout <>

> Hi all,
> I've been doing a bunch of performance measurement of Spark and, as part of
> doing this, added metrics that record the average CPU utilization, disk
> throughput and utilization for each block device, and network throughput
> while each task is running.  These metrics are collected by reading the
> /proc filesystem so work only on Linux.  I'm happy to submit a pull request
> with the appropriate changes but first wanted to see if sufficiently many
> people think this would be useful.  I know the metrics reported by Spark
> (and in the UI) are already overwhelming to some folks so don't want to add
> more instrumentation if it's not widely useful.
> These metrics are slightly more difficult to interpret for Spark than
> similar metrics reported by Hadoop because, with Spark, multiple tasks run
> in the same JVM and therefore as part of the same process.  This means
> that, for example, the CPU utilization metrics reflect the CPU use across
> all tasks in the JVM, rather than only the CPU time used by the particular
> task.  This is a pro and a con -- it makes it harder to determine why
> utilization is high (it may be from a different task) but it also makes the
> metrics useful for diagnosing straggler problems.  Just wanted to clarify
> this before asking folks to weigh in on whether the added metrics would be
> useful.
> -Kay
> (if you're curious, the instrumentation code is on a very messy branch
> here:
> )

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message