spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: How to monitor Spark Streaming from Kafka?
Date Mon, 01 Jun 2015 22:57:56 GMT
I think you can use SPM - http://sematext.com/spm - it will give you all
Spark and all Kafka metrics, including offsets broken down by topic, etc.
out of the box.  I see more and more people using it to monitor various
components in data processing pipelines, a la
http://blog.sematext.com/2015/04/22/monitoring-stream-processing-tools-cassandra-kafka-and-spark/

Otis

On Mon, Jun 1, 2015 at 5:23 PM, dgoldenberg <dgoldenberg123@gmail.com>
wrote:

> Hi,
>
> What are some of the good/adopted approached to monitoring Spark Streaming
> from Kafka?  I see that there are things like
> http://quantifind.github.io/KafkaOffsetMonitor, for example.  Do they all
> assume that Receiver-based streaming is used?
>
> Then "Note that one disadvantage of this approach (Receiverless Approach,
> #2) is that it does not update offsets in Zookeeper, hence Zookeeper-based
> Kafka monitoring tools will not show progress. However, you can access the
> offsets processed by this approach in each batch and update Zookeeper
> yourself".
>
> The code sample, however, seems sparse. What do you need to do here? -
>  directKafkaStream.foreachRDD(
>      new Function<JavaPairRDD&lt;String, String>, Void>() {
>          @Override
>          public Void call(JavaPairRDD<String, Integer> rdd) throws
> IOException {
>              OffsetRange[] offsetRanges =
> ((HasOffsetRanges)rdd).offsetRanges
>              // offsetRanges.length = # of Kafka partitions being consumed
>              ...
>              return null;
>          }
>      }
>  );
>
> and if these are updated, will KafkaOffsetMonitor work?
>
> Monitoring seems to center around the notion of a consumer group.  But in
> the receiverless approach, code on the Spark consumer side doesn't seem to
> expose a consumer group parameter.  Where does it go?  Can I/should I just
> pass in group.id as part of the kafkaParams HashMap?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-monitor-Spark-Streaming-from-Kafka-tp23103.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message