spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth <srikanth...@gmail.com>
Subject Re: Rebalancing when adding kafka partitions
Date Tue, 16 Aug 2016 19:17:15 GMT
Yes, SubscribePattern detects new partition. Also, it has a comment saying

Subscribe to all topics matching specified pattern to get dynamically
> assigned partitions.
>  * The pattern matching will be done periodically against topics existing
> at the time of check.
>  * @param pattern pattern to subscribe to
>  * @param kafkaParams Kafka


Who does the new partition discover? Underlying kafka consumer or
spark-streaming-kafka-0-10-assembly??

Srikanth

On Fri, Aug 12, 2016 at 5:15 PM, Cody Koeninger <cody@koeninger.org> wrote:

> Hrrm, that's interesting. Did you try with subscribe pattern, out of
> curiosity?
>
> I haven't tested repartitioning on the  underlying new Kafka consumer, so
> its possible I misunderstood something.
> On Aug 12, 2016 2:47 PM, "Srikanth" <srikanth.ht@gmail.com> wrote:
>
>> I did try a test with spark 2.0 + spark-streaming-kafka-0-10-assembly.
>> Partition was increased using "bin/kafka-topics.sh --alter" after spark
>> job was started.
>> I don't see messages from new partitions in the DStream.
>>
>>     KafkaUtils.createDirectStream[Array[Byte], Array[Byte]] (
>>>         ssc, PreferConsistent, Subscribe[Array[Byte],
>>> Array[Byte]](topics, kafkaParams) )
>>>     .map(r => (r.key(), r.value()))
>>
>>
>> Also, no.of partitions did not increase too.
>>
>>>     dataStream.foreachRDD( (rdd, curTime) => {
>>>      logger.info(s"rdd has ${rdd.getNumPartitions} partitions.")
>>
>>
>> Should I be setting some parameter/config? Is the doc for new integ
>> available?
>>
>> Thanks,
>> Srikanth
>>
>> On Fri, Jul 22, 2016 at 2:15 PM, Cody Koeninger <cody@koeninger.org>
>> wrote:
>>
>>> No, restarting from a checkpoint won't do it, you need to re-define the
>>> stream.
>>>
>>> Here's the jira for the 0.10 integration
>>>
>>> https://issues.apache.org/jira/browse/SPARK-12177
>>>
>>> I haven't gotten docs completed yet, but there are examples at
>>>
>>> https://github.com/koeninger/kafka-exactly-once/tree/kafka-0.10
>>>
>>> On Fri, Jul 22, 2016 at 1:05 PM, Srikanth <srikanth.ht@gmail.com> wrote:
>>> > In Spark 1.x, if we restart from a checkpoint, will it read from new
>>> > partitions?
>>> >
>>> > If you can, pls point us to some doc/link that talks about Kafka 0.10
>>> integ
>>> > in Spark 2.0.
>>> >
>>> > On Fri, Jul 22, 2016 at 1:33 PM, Cody Koeninger <cody@koeninger.org>
>>> wrote:
>>> >>
>>> >> For the integration for kafka 0.8, you are literally starting a
>>> >> streaming job against a fixed set of topicapartitions,  It will not
>>> >> change throughout the job, so you'll need to restart the spark job if
>>> >> you change kafka partitions.
>>> >>
>>> >> For the integration for kafka 0.10 / spark 2.0, if you use subscribe
>>> >> or subscribepattern, it should pick up new partitions as they are
>>> >> added.
>>> >>
>>> >> On Fri, Jul 22, 2016 at 11:29 AM, Srikanth <srikanth.ht@gmail.com>
>>> wrote:
>>> >> > Hello,
>>> >> >
>>> >> > I'd like to understand how Spark Streaming(direct) would handle
>>> Kafka
>>> >> > partition addition?
>>> >> > Will a running job be aware of new partitions and read from it?
>>> >> > Since it uses Kafka APIs to query offsets and offsets are handled
>>> >> > internally.
>>> >> >
>>> >> > Srikanth
>>> >
>>> >
>>>
>>
>>

Mime
View raw message