spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luca Canali (Jira)" <j...@apache.org>
Subject [jira] [Created] (SPARK-30306) Instrument Python UDF execution time and metrics using Spark Metrics system
Date Thu, 19 Dec 2019 14:55:00 GMT
Luca Canali created SPARK-30306:
-----------------------------------

             Summary: Instrument Python UDF execution time and metrics using Spark Metrics
system
                 Key: SPARK-30306
                 URL: https://issues.apache.org/jira/browse/SPARK-30306
             Project: Spark
          Issue Type: Improvement
          Components: PySpark, Spark Core
    Affects Versions: 3.0.0
            Reporter: Luca Canali


This proposes to extend Spark instrumentation to add metrics aimed at understanding the performance
of Python code called by Spark, via UDF, Pandas UDF or with MapPartittions. Relevant performance
counters are exposed using the Spark Metrics System (based on the Dropwizard library).  This
allows to easily consume the metrics produced by executors, for example using a performance
dashboard. See also the attached screenshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message