flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chirag Dewan <chirag.dewa...@yahoo.in>
Subject Re: Deploying Flink with JobManager HA on Docker Swarm/Kubernetes
Date Thu, 15 Feb 2018 03:59:33 GMT
 Thanks a lot Aljoscha.
I was doing a silly mistake. TaskManagers can now register with JobManager.
One more thing, does Flink now store Job Graphs on ZK too?
Regards,
Chirag
    On Wednesday, 14 February, 2018, 8:06:14 PM IST, Aljoscha Krettek <aljoscha@apache.org>
wrote:  
 
 It should be roughly the same settings that you use in your JobManager. They are described
here: https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#zookeeper-based-ha-mode


On 14. Feb 2018, at 15:32, Chirag Dewan <chirag.dewan22@yahoo.in> wrote:
Thanks Aljoscha.
I haven't checked that bit. Is there any configuration for TaskManagers to find ZK?
Regards,
Chirag

Sent from Yahoo Mail on Android 
 
  On Wed, 14 Feb 2018 at 7:43 PM, Aljoscha Krettek<aljoscha@apache.org> wrote:   Do
you see in the logs whether the TaskManager correctly connect to ZooKeeper as well? They need
this in order to find the JobManager leader.
Best,Aljoscha


On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewan22@yahoo.in> wrote:
Hi,
I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For JobManager HA, I
have started a 3 node zookeeper service on the same swarm network and configured Flink's zookeeper
quorum with zookeeper service instances. 
JobManager gets started with the LeaderElectionService and gets assigned a LeaderSessionID
too, which I can see from the following log statements(attaching only related logs) :
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService 
 org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService.org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService 
- Starting ZooKeeperLeaderRetrievalService.JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager
was granted leadership with leader session ID Some(1f3b2ec6-77b6-4532-928f-ad8befd5202f).
 Trying to associate with JobManager leader akka.tcp://flink@jobmanager:6123/user/jobmanager Resource
Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#590681231]
- leader session 1f3b2ec6-77b6-4532-928f-ad8befd5202f

But TaskManagers are not able to register with the JobManager and gives the following error:
Discard message LeaderSessionMessage(00000000-0000-0000-0000-000000000000,RegisterTaskManager(4fc8aceeae1e27e42b9f16df6c0cf5e3,4fc8aceeae1e27e42b9f16df6c0cf5e3
@ a118cdf39114 (dataPort=43017),cores=1, physMem=1044111360, heap=536870912, managed=324208384,1))
because the expected leader session ID 1f3b2ec6-77b6-4532-928f-ad8befd5202f did not equal
the received leader session ID 00000000-0000-0000-0000-000000000000.

Seems like the ResourceManager was not able to retrieve the LeaderSessionID and passed 00
ID. 
One interesting thing I observed was a ZK version log:
The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT
will be used instead.

Is this a ZK version problem? Should I be using ZK 3.4.6?
My configuration:
Flink Version : 1.4.0ZK version : 3.4.11 (I just pulled the latest image)
Thanks in advance. 
Chirag


  


  
Mime
View raw message