spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch
Date Sat, 03 Sep 2016 14:03:05 GMT
Not built in, you're going to have to do some work.

On Sep 2, 2016 16:33, "sagarcasual ." <sagarcasual@gmail.com> wrote:

> Hi Cody, thanks for the reply.
> I am using Spark 1.6.1 with Kafka 0.9.
> When I want to stop streaming, stopping the context sounds ok, but for
> temporarily excluding partitions is there any way I can supply
> topic-partition info on the fly at the beginning of every pull dynamically.
> Will streaminglistener be of any help?
>
> On Fri, Sep 2, 2016 at 10:37 AM, Cody Koeninger <cody@koeninger.org>
> wrote:
>
>> If you just want to pause the whole stream, just stop the app and then
>> restart it when you're ready.
>>
>> If you want to do some type of per-partition manipulation, you're
>> going to need to write some code.  The 0.10 integration makes the
>> underlying kafka consumer pluggable, so you may be able to wrap a
>> consumer to do what you need.
>>
>> On Fri, Sep 2, 2016 at 12:28 PM, sagarcasual . <sagarcasual@gmail.com>
>> wrote:
>> > Hello, this is for
>> > Pausing spark kafka streaming (direct) or exclude/include some
>> partitions on
>> > the fly per batch
>> > =========================================================
>> > I have following code that creates a direct stream using Kafka
>> connector for
>> > Spark.
>> >
>> > final JavaInputDStream<KafkaMessage> msgRecords =
>> > KafkaUtils.createDirectStream(
>> >             jssc, String.class, String.class, StringDecoder.class,
>> > StringDecoder.class,
>> >             KafkaMessage.class, kafkaParams, topicsPartitions,
>> >             message -> {
>> >                 return KafkaMessage.builder()
>> >                         .
>> >                         .build();
>> >             }
>> >     );
>> >
>> > However I want to handle a situation, where I can decide that this
>> streaming
>> > needs to pause for a while on conditional basis, is there any way to
>> achieve
>> > this? Say my Kafka is undergoing some maintenance, so between 10AM to
>> 12PM
>> > stop processing, and then again pick up at 12PM from the last offset,
>> how do
>> > I do it?
>> >
>> > Also, assume all of a sudden we want to take one-or-more of the
>> partitions
>> > for a pull and add it back after some pulls, how do I achieve that?
>> >
>> > -Regards
>> > Sagar
>> >
>>
>
>

Mime
View raw message