spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.
Date Wed, 27 Feb 2019 11:17:04 GMT
Hi Akshay,

The feature what you've mentioned has a default value of 7 days...

BR,
G


On Wed, Feb 27, 2019 at 7:38 AM Akshay Bhardwaj <
akshay.bhardwaj1988@gmail.com> wrote:

> Hi Guillermo,
>
> What was the interval in between restarting the spark job? As a feature in
> Kafka, a broker deleted offsets for a consumer group after inactivity of 24
> hours.
> In such a case, the newly started spark streaming job will read offsets
> from beginning for the same groupId.
>
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Thu, Feb 21, 2019 at 9:08 PM Gabor Somogyi <gabor.g.somogyi@gmail.com>
> wrote:
>
>> From the info you've provided not much to say.
>> Maybe you could collect sample app, logs etc, open a jira and we can take
>> a deeper look at it...
>>
>> BR,
>> G
>>
>>
>> On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz <konstt2000@gmail.com>
>> wrote:
>>
>>> I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct
>>> Stream as connector. I consume data from Kafka and autosave the offsets.
>>> I can see Spark doing commits in the logs of the last offsets processed,
>>> Sometimes I have restarted spark and it starts from the beginning, when I'm
>>> using the same groupId.
>>>
>>> Why could it happen? it only happen rarely.
>>>
>>

Mime
View raw message