spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruslan Dautkhanov <dautkha...@gmail.com>
Subject Re: How to monitor Spark Streaming from Kafka?
Date Tue, 02 Jun 2015 22:22:52 GMT
Nobody mentioned CM yet? Kafka is now supported by CM/CDH 5.4

http://www.cloudera.com/content/cloudera/en/documentation/cloudera-kafka/latest/PDF/cloudera-kafka.pdf




-- 
Ruslan Dautkhanov

On Mon, Jun 1, 2015 at 5:19 PM, Dmitry Goldenberg <dgoldenberg123@gmail.com>
wrote:

> Thank you, Tathagata, Cody, Otis.
>
> - Dmitry
>
>
> On Mon, Jun 1, 2015 at 6:57 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>> I think you can use SPM - http://sematext.com/spm - it will give you all
>> Spark and all Kafka metrics, including offsets broken down by topic, etc.
>> out of the box.  I see more and more people using it to monitor various
>> components in data processing pipelines, a la
>> http://blog.sematext.com/2015/04/22/monitoring-stream-processing-tools-cassandra-kafka-and-spark/
>>
>> Otis
>>
>> On Mon, Jun 1, 2015 at 5:23 PM, dgoldenberg <dgoldenberg123@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> What are some of the good/adopted approached to monitoring Spark
>>> Streaming
>>> from Kafka?  I see that there are things like
>>> http://quantifind.github.io/KafkaOffsetMonitor, for example.  Do they
>>> all
>>> assume that Receiver-based streaming is used?
>>>
>>> Then "Note that one disadvantage of this approach (Receiverless Approach,
>>> #2) is that it does not update offsets in Zookeeper, hence
>>> Zookeeper-based
>>> Kafka monitoring tools will not show progress. However, you can access
>>> the
>>> offsets processed by this approach in each batch and update Zookeeper
>>> yourself".
>>>
>>> The code sample, however, seems sparse. What do you need to do here? -
>>>  directKafkaStream.foreachRDD(
>>>      new Function<JavaPairRDD&lt;String, String>, Void>() {
>>>          @Override
>>>          public Void call(JavaPairRDD<String, Integer> rdd) throws
>>> IOException {
>>>              OffsetRange[] offsetRanges =
>>> ((HasOffsetRanges)rdd).offsetRanges
>>>              // offsetRanges.length = # of Kafka partitions being
>>> consumed
>>>              ...
>>>              return null;
>>>          }
>>>      }
>>>  );
>>>
>>> and if these are updated, will KafkaOffsetMonitor work?
>>>
>>> Monitoring seems to center around the notion of a consumer group.  But in
>>> the receiverless approach, code on the Spark consumer side doesn't seem
>>> to
>>> expose a consumer group parameter.  Where does it go?  Can I/should I
>>> just
>>> pass in group.id as part of the kafkaParams HashMap?
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-monitor-Spark-Streaming-from-Kafka-tp23103.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Mime
View raw message