spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: Best way to emit custom metrics to Prometheus in spark structured streaming
Date Tue, 03 Nov 2020 04:46:49 GMT
You can try out "Dataset.observe" added in Spark 3, which enables arbitrary
metrics to be logged and exposed to streaming query listeners.

On Tue, Nov 3, 2020 at 3:25 AM meetwes <meetwes@gmail.com> wrote:

> Hi I am looking for the right approach to emit custom metrics for spark
> structured streaming job. *Actual Scenario:*
> I have an aggregated dataframe let's say with (id, key, value) columns.
> One of the kpis could be 'droppedRecords' and the corresponding value
> column has the number of dropped records. I need to filter all the KPIs
> with 'droppedRecords' and compute the sum on it's value column.
>
> *Challenges:*
> 1) Need to use only one streaming query so the metrics will be accurate (1
> readStream and 1 writeStream). If the metrics are emitted in a separate
> query, then it can cause inconsistencies due to varying watermark time
> between the query that does the aggregation and the one that gets only the
> metrics.
>
> *I evaluated some of the approaches:*
> 1) *foreachBatch sink:* This works for emitting metrics but there are
> other bugs.. Eg: The numOutputRows emitted in logs is always -1.
>
> 2) *Using accumulators:*
>
> val dropCounts: LongAccumulator = new LongAccumulator
>
> spark.sparkContext.register(dropCounts, "Drop Counts Accumulator")
> df.as[].map(row => {
>
>     val value = row.value
>
>     dropCounts.add(value.toLong)
>
> })
>
> This approach seems to have a bug in spark. The executor does add the
> value correctly but the driver's count is always 0.
>
> 3) *Using mapGroupsWithState.* This requires an action on the aggregated
> dataframe to retrieve metrics, therefore creates another streaming query.
>
> I am using spark 3.0.1. What's would be the best way to implement custom
> metrics?
> ------------------------------
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Mime
View raw message