flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Re: High back-pressure after recovering from a save point
Date Fri, 14 Jul 2017 17:43:23 GMT
It will work if you assign a new uid to the Kafka source.

Gyula

On Fri, Jul 14, 2017, 18:42 Tzu-Li (Gordon) Tai <tzulitai@apache.org> wrote:

> One thing: do note that `FlinkKafkaConsumer#setStartFromLatest()` does not
> have any effect when starting from savepoints.
> i.e., the consumer will still start from whatever offset is written in the
> savepoint.
>
>
> On 15 July 2017 at 12:38:10 AM, Tzu-Li (Gordon) Tai (tzulitai@apache.org)
> wrote:
>
> Can you try starting from the savepoint, but telling Kafka to start from
> the latest offset?
>
>
> (@gordon: Is that possible in Flink 1.3.1 or only in 1.4-SNAPSHOT ?)
>
> This is already possible in Flink 1.3.x.
> `FlinkKafkaConsumer#setStartFromLatest()` would be it.
>
> On 15 July 2017 at 12:33:53 AM, Stephan Ewen (sewen@apache.org) wrote:
>
> Can you try starting from the savepoint, but telling Kafka to start from
> the latest offset?
>
> (@gordon: Is that possible in Flink 1.3.1 or only in 1.4-SNAPSHOT ?)
>
> On Fri, Jul 14, 2017 at 11:18 AM, Kien Truong <duckientruong@gmail.com>
> wrote:
>
>> Hi,
>>
>> Sorry for the version typo, I'm running 1.3.1. I did not test with 1.2.x.
>>
>> The jobs runs fine with almost 0 back-pressure if it's started from
>> scratch or if I reuse the kafka consumers group id without specifying the
>> safe point.
>>
>> Best regards,
>> Kien
>> On Jul 14, 2017, at 15:59, Stephan Ewen <sewen@apache.org> wrote:
>>>
>>> Hi!
>>>
>>> Flink 1.3.2 does not yet exist. Do you mean 1.3.1 or latest master?
>>>
>>> Can you tell us whether this occurs only in 1.3.x and worked well in
>>> 1.2.x?
>>> If you just keep the job running without savepoint/restore, you do not
>>> get into backpressure situations?
>>>
>>> Thanks,
>>> Stephan
>>>
>>>
>>> On Fri, Jul 14, 2017 at 1:15 AM, Kien Truong <duckientruong@gmail.com>
>>> wrote:
>>>
>>>> Hi Fabian,
>>>> This happens to me even when the restore is immediate, so there's not
>>>> much data in Kafka to catch up (5 minutes max)
>>>>
>>>> Regards
>>>> Kien
>>>> On Jul 13, 2017, at 23:40, Fabian Hueske < fhueske@gmail.com> wrote:
>>>>>
>>>>> I would guess that this is quite usual because the job has to
>>>>> "catch-up" work.
>>>>> For example, if you took a save point two days ago and restore the job
>>>>> today, the input data of the last two days has been written to Kafka
>>>>> (assuming Kafka as source) and needs to be processed.
>>>>> The job will now read as fast as possible from Kafka to catch-up to
>>>>> the presence. This means the data is much fast ingested (as fast as Kafka
>>>>> can read and ship it) than during regular processing (as fast as your
>>>>> sources produce).
>>>>> The processing speed is bound by your Flink job which means there will
>>>>> be backpressure.
>>>>>
>>>>> Once the job caught-up, the backpressure should disappear.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>> 2017-07-13 15:48 GMT+02:00 Kien Truong <duckientruong@gmail.com>:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have one job where back-pressure  is significantly higher after
>>>>>> resuming from a save point.
>>>>>>
>>>>>> Because that job makes heavy use of stateful functions with
>>>>>> RocksDBStateBackend ,
>>>>>>
>>>>>> I'm suspecting that this is the cause of performance degradation.
>>>>>>
>>>>>> Does anyone encounter simillar issues or have any tips for debugging
?
>>>>>>
>>>>>>
>>>>>> I'm using Flink 1.3.2 with YARN in detached mode.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Kien
>>>>>>
>>>>>>
>>>>>
>>>
>

Mime
View raw message