flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krzysztof Białek <krzysiek.bia...@gmail.com>
Subject Re: Flink + Consul as HA backend. What do you think?
Date Thu, 15 Feb 2018 10:08:53 GMT
Alright, just came across the first real-life problem with my Consul HA
implementation.
In Consul KV store there is a limit of 512kB per node and JobGraph of one
of my apps exceeded it.
In ZK there seems to be similar zNode Limit = 1MB
How did you workaround it? Or maybe I serialize the JobGraph wrong?

On Thu, Feb 15, 2018 at 8:47 AM, Krzysztof Białek <krzysiek.bialek@gmail.com
> wrote:

> I have very little experience with ZK and cannot explain the differences
> between ZK and Consul by myself. However there are some comparisions
> available:
> * https://www.consul.io/intro/vs/zookeeper.html - done by Consul so may
> be biased
> * https://www.slideshare.net/IvanGlushkov/zookeeper-vs-consul-41882991
> * https://jakon.me/2017/01/consul-deployment-orchestration/
>
> Regarding testing - I did basic failover scenarios on my workstation with
> 2 JobManagers, 2 TaskManagers and WindowJoin example app with checkpointing
> and restarting turned on.
> I was running the cluster no longer than for few hours.
>
> For now I'd like to open Flink for alternative HA backends (
> https://issues.apache.org/jira/browse/FLINK-8660)
>
>
> On Wed, Feb 14, 2018 at 1:47 PM, Chesnay Schepler <chesnay@apache.org>
> wrote:
>
>> Hello,
>>
>> I don't know anything about Consul but the prospect of having other
>> options beside Zookeeper is very interesting. It's rather surprising how
>> little you had to modify existing classes to get this to work.
>>
>> It may take a bit until someone provides proper feedback as the community
>> is currently prepping 2 releases (1.4.1 and 1.5), please don't be
>> discouraged by this.
>>
>> I saw that your branch was based on the 1.4 version. In 1.5 we reworked
>> the distributed architecture of Flink (in an initiative commonly referred
>> to as FLIP-6) which may affect your work.
>>
>> 2 things to note from my side:
>> It would also be helpful if you could explain the differences between ZK
>> and Consul and how they stack up in terms of guarantees etc. .
>> How did you test your solution so far? (Like how long was a cluster
>> running, what failure scenarios)
>>
>>
>> On 13.02.2018 21:38, Krzysztof Białek wrote:
>>
>> I'd like to get your opinion about this idea. I found related JIRA issue FLINK-2366,
>> but it seems to be dead. To attract your attention I copy my comment here.
>>
>> As an experiment I've implemented Flink HA on top of Consul. The
>> implementation is working fine in the "lab" but is not battle tested yet.
>> The source code is available at https://github.com/kbialek/
>> flink/tree/feature/consul (flink-runtime package
>> org.apache.flink.runtime.consul)
>>
>> Why?. Generally I'd like to keep as less moving parts as possible. We do
>> not have Zookeeper running, but Consul is already in place. And in the end
>> freedom of choice is a good thing.
>>
>> It would be great to see built-in Consul support in Flink someday, but if
>> it is not expected then I suggest a little refactoring to open possibility
>> to configure HighAvailabilityServicesFactory. As far as I can see this
>> should be enough to inject any HA implementation.
>>
>> Regards,
>> Krzysztof
>>
>>
>>
>

Mime
View raw message