spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dgoldenberg <>
Subject How to monitor Spark Streaming from Kafka?
Date Mon, 01 Jun 2015 21:23:13 GMT

What are some of the good/adopted approached to monitoring Spark Streaming
from Kafka?  I see that there are things like, for example.  Do they all
assume that Receiver-based streaming is used?

Then "Note that one disadvantage of this approach (Receiverless Approach,
#2) is that it does not update offsets in Zookeeper, hence Zookeeper-based
Kafka monitoring tools will not show progress. However, you can access the
offsets processed by this approach in each batch and update Zookeeper

The code sample, however, seems sparse. What do you need to do here? -
     new Function<JavaPairRDD&lt;String, String>, Void>() {
         public Void call(JavaPairRDD<String, Integer> rdd) throws
IOException {
             OffsetRange[] offsetRanges =
             // offsetRanges.length = # of Kafka partitions being consumed
             return null;

and if these are updated, will KafkaOffsetMonitor work?

Monitoring seems to center around the notion of a consumer group.  But in
the receiverless approach, code on the Spark consumer side doesn't seem to
expose a consumer group parameter.  Where does it go?  Can I/should I just
pass in as part of the kafkaParams HashMap?


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message