spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Surendranauth Hiraman <suren.hira...@velos.io>
Subject Re: CPU/Disk/network performance instrumentation
Date Wed, 09 Jul 2014 21:44:01 GMT
+1 on advanced tab.



On Wed, Jul 9, 2014 at 5:20 PM, Mridul Muralidharan <mridul@gmail.com>
wrote:

> +1 on advanced mode !
>
> Regards.
> Mridul
>
> On Thu, Jul 10, 2014 at 12:55 AM, Reynold Xin <rxin@databricks.com> wrote:
> > Maybe it's time to create an advanced mode in the ui.
> >
> >
> > On Wed, Jul 9, 2014 at 12:23 PM, Kay Ousterhout <keo@eecs.berkeley.edu>
> > wrote:
> >
> >> Hi all,
> >>
> >> I've been doing a bunch of performance measurement of Spark and, as
> part of
> >> doing this, added metrics that record the average CPU utilization, disk
> >> throughput and utilization for each block device, and network throughput
> >> while each task is running.  These metrics are collected by reading the
> >> /proc filesystem so work only on Linux.  I'm happy to submit a pull
> request
> >> with the appropriate changes but first wanted to see if sufficiently
> many
> >> people think this would be useful.  I know the metrics reported by Spark
> >> (and in the UI) are already overwhelming to some folks so don't want to
> add
> >> more instrumentation if it's not widely useful.
> >>
> >> These metrics are slightly more difficult to interpret for Spark than
> >> similar metrics reported by Hadoop because, with Spark, multiple tasks
> run
> >> in the same JVM and therefore as part of the same process.  This means
> >> that, for example, the CPU utilization metrics reflect the CPU use
> across
> >> all tasks in the JVM, rather than only the CPU time used by the
> particular
> >> task.  This is a pro and a con -- it makes it harder to determine why
> >> utilization is high (it may be from a different task) but it also makes
> the
> >> metrics useful for diagnosing straggler problems.  Just wanted to
> clarify
> >> this before asking folks to weigh in on whether the added metrics would
> be
> >> useful.
> >>
> >> -Kay
> >>
> >> (if you're curious, the instrumentation code is on a very messy branch
> >> here:
> >>
> >>
> https://github.com/kayousterhout/spark-1/tree/proc_logging_perf_minimal_temp/core/src/main/scala/org/apache/spark/performance_logging
> >> )
> >>
>



-- 

SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v <suren.hiraman@sociocast.com>elos.io
W: www.velos.io

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message