flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christophe Jolif <cjo...@gmail.com>
Subject Re: RocksDB / checkpoint questions
Date Sat, 03 Feb 2018 16:45:27 GMT
Thanks for sharing Kien. Sounds like the logical behavior but good to hear
it is confirmed by your experience.

--
Christophe

On Sat, Feb 3, 2018 at 7:25 AM, Kien Truong <duckientruong@gmail.com> wrote:

>
>
> Sent from TypeApp <http://www.typeapp.com/r?b=11979>
> On Feb 3, 2018, at 10:48, Kien Truong <duckientruong@gmail.com> wrote:
>>
>> Hi,
>> Speaking from my experience, if the distributed disk fail, the checkpoint
>> will fail as well, but the job will continue running. The checkpoint
>> scheduler will keep running, so the first scheduled checkpoint after you
>> repair your disk should succeed.
>>
>> Of course, if you also write to the distributed disk inside your job,
>> then your job may crash too, but this is unrelated to the checkpoint
>> process.
>>
>> Best regards,
>> Kien
>>
>> Sent from TypeApp <http://www.typeapp.com/r?b=11979>
>> On Feb 2, 2018, at 23:30, Christophe Jolif < cjolif@gmail.com> wrote:
>>>
>>> If I understand well RocksDB is using two disk, the Task Manager local
>>> disk for "local storage" of the state and the distributed disk for
>>> checkpointing.
>>>
>>> Two questions:
>>>
>>> - if I have 3 TaskManager I should expect more or less (depending on how
>>> the tasks are balanced) to find a third of my overall state stored on disk
>>> on each of this TaskManager node?
>>>
>>> - if the local node/disk fails I will get the state back from the
>>> distributed disk and things will start again and all is fine. However what
>>> happens if the distributed disk fails? Will Flink continue processing
>>> waiting for me to mount a new distributed disk? Or will it stop? May I lose
>>> data/reprocess things under that condition?
>>>
>>> --
>>> Christophe Jolif
>>>
>>

Mime
View raw message