spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Kafka stream offset management question
Date Tue, 08 Nov 2016 14:42:31 GMT
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html

specifically

http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#storing-offsets

Have you set enable.auto.commit to false?

The new consumer stores offsets in kafka, so the idea of specifically
deleting offsets for that group doesn't really make sense.

In other words

- set enable.auto.commit to false
- use a new group.id


On Tue, Nov 8, 2016 at 2:21 AM, Haopu Wang <HWang@qilinsoft.com> wrote:
> I'm using Kafka direct stream (auto.offset.reset = earliest) and enable
> Spark streaming's checkpoint.
>
>
>
> The application starts and consumes messages correctly. Then I stop the
> application and clean the checkpoint folder.
>
>
>
> I restart the application and expect it to consumes old messages. But it
> doesn't consume any data. And there are logs as below:
>
>
>
>          [org.apache.spark.streaming.kafka010.KafkaRDD] (Executor task
> launch worker-0;) Beginning offset 25 is the same as ending offset skipping
> aa 0
>
>
>
> So I think the offset is stored not only in checkpoint but also in Kafka,
> right?
>
> Is it because I'm using the same group.id? How can I delete the consumer
> group manually?
>
>
>
> Thanks again for any help!
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message