spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshay Bhardwaj <akshay.bhardwaj1...@gmail.com>
Subject Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.
Date Wed, 27 Feb 2019 06:38:18 GMT
Hi Guillermo,

What was the interval in between restarting the spark job? As a feature in
Kafka, a broker deleted offsets for a consumer group after inactivity of 24
hours.
In such a case, the newly started spark streaming job will read offsets
from beginning for the same groupId.

Akshay Bhardwaj
+91-97111-33849


On Thu, Feb 21, 2019 at 9:08 PM Gabor Somogyi <gabor.g.somogyi@gmail.com>
wrote:

> From the info you've provided not much to say.
> Maybe you could collect sample app, logs etc, open a jira and we can take
> a deeper look at it...
>
> BR,
> G
>
>
> On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz <konstt2000@gmail.com>
> wrote:
>
>> I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct Stream
>> as connector. I consume data from Kafka and autosave the offsets.
>> I can see Spark doing commits in the logs of the last offsets processed,
>> Sometimes I have restarted spark and it starts from the beginning, when I'm
>> using the same groupId.
>>
>> Why could it happen? it only happen rarely.
>>
>

Mime
View raw message