spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From asma zgolli <zgollia...@gmail.com>
Subject How to develop a listener that collects the statistics for spark sql and execution time for each operator
Date Tue, 19 Mar 2019 16:57:21 GMT
Hello ,


I'm looking for a way to develop a listener that collects the statistics
for spark sql queries as well as the execution time for each physical
operator of the physical plan and store them in a database.

I want to develop an application similar to the following application :
import org.apache.spark.scheduler._
import org.apache.log4j.LogManager
val logger = LogManager.getLogger("CustomListener")
class CustomListener extends SparkListener {
override def onStageCompleted(stageCompleted: SparkListenerStageCompleted):
Unit = {
logger.warn(s"Stage completed, runTime:
${stageCompleted.stageInfo.taskMetrics.executorRunTime}, " +
s"cpuTime: ${stageCompleted.stageInfo.taskMetrics.executorCpuTime}")
}
}
val myListener=new CustomListener
//sc is the active Spark Context
sc.addSparkListener(myListener)
// run a simple Spark job and note the additional warning messages emitted
by the CustomLister with
// Spark execution metrics, for exmaple run
spark.time(sql("select count(*) from range(1e4) cross join
range(1e4)").show)


but for spark sql runtime statistics.
I want to store the same statistics as the ones displayed in :
http://localhost:4040/SQL/execution/?id=1
the ones stored in the picture attached.

thank you very much


yours sincerely,
Asma ZGOLLI

PhD student in data engineering - computer science
Email : zgolliasma@gmail.com
email alt:  asma.zgolli@univ-grenoble-alpes.fr
<zgolliasma@gmail.com>

Mime
View raw message