spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Jeffrey <bryan.jeff...@gmail.com>
Subject Re: Access batch statistics in Spark Streaming
Date Mon, 08 Feb 2016 19:05:10 GMT
>From within a Spark job you can use a Periodic Listener:

ssc.addStreamingListener(PeriodicStatisticsListener(Seconds(60)))

class PeriodicStatisticsListener(timePeriod: Duration) extends
StreamingListener {
  private val logger = LoggerFactory.getLogger("Application")
  override def onBatchCompleted(batchCompleted :
org.apache.spark.streaming.scheduler.StreamingListenerBatchCompleted) :
scala.Unit = {

    if(startTime == Time(0)) {
      startTime = batchCompleted.batchInfo.batchTime
    }
    logger.info("Batch Complete @ " + new
DateTime(batchCompleted.batchInfo.batchTime.milliseconds).withZone(DateTimeZone.UTC)
      + " (" + batchCompleted.batchInfo.batchTime + ")" +
      " with records " + batchCompleted.batchInfo.numRecords +
      " in processing time " +
batchCompleted.batchInfo.processingDelay.getOrElse(0.toLong) / 1000 + "
seconds")
  }

On Mon, Feb 8, 2016 at 11:34 AM, Chen Song <chen.song.82@gmail.com> wrote:

> Apologize in advance if someone has already asked and addressed this
> question.
>
> In Spark Streaming, how can I programmatically get the batch statistics
> like schedule delay, total delay and processing time (They are shown in the
> job UI streaming tab)? I need such information to raise alerts in some
> circumstances. For example, if the scheduling is delayed more than a
> threshold.
>
> Thanks,
> Chen
>
>

Mime
View raw message