kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adis Ababa <idichettaku...@gmail.com>
Subject Spark per topic num of partitions doubt
Date Wed, 28 Sep 2016 03:21:22 GMT
I have asked the question on stackoverflow as well here

I am confused about the "per topic number of partitions" parameter when
creating a inputDstream using KafkaUtils.createStream(...) method.

I am pasting the question here, please help.

>From [Spark Documentation][1]

> parameter topicMap of KafkaUtils.createStream(...) method determines
 "per-topic number of Kafka partitions to consume" [Javadoc here][2]

So, when I created a kafka topic with 3 partitions and started a spark
receiver as

    Map<String, Integer> topicMap = new HashMap<>();
        topicMap.put(topic, 1);
    JavaPairReceiverInputDStream<String,String> inputDStream =
          KafkaUtils.createStream(javaStreamingContext, zookeeperQuorum,
groupId, topicMap);

I expected this receiver to receive messages from ONLY one partition of the
3 partitions that I created. However, when I check the offset checker, I
see the following:

    Pid Offset          logSize         Lag             Owner
    0   9               9               0               none
    1   11              11              0               none
    2   7               7               0               none

I expected this code to receive messages from one partition and then I
thought I needed to start more receivers (one per partition) as given in
the [documentation here][3] to cover all Kafka topic partitions.

    int numStreams = 3;
    List<JavaPairDStream<String, String>> kafkaStreams = new
    for (int i = 0; i < numStreams; i++) {
    JavaPairDStream<String, String> unifiedStream =
streamingContext.union(kafkaStreams.get(0), kafkaStreams.subList(1,

So, my question is can one receiver receive messages from all partitions?
If so, what in the world does the topicMap(topic -> numPartitions) mean?

  [1]: http://spark.apache.org/docs/latest/streaming-kafka-integration.html

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message