flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chirag Dewan <chirag.dewa...@yahoo.in>
Subject Re: Deploying Flink with JobManager HA on Docker Swarm/Kubernetes
Date Wed, 14 Feb 2018 14:32:54 GMT
Thanks Aljoscha.
I haven't checked that bit. Is there any configuration for TaskManagers to find ZK?

Sent from Yahoo Mail on Android 
  On Wed, 14 Feb 2018 at 7:43 PM, Aljoscha Krettek<aljoscha@apache.org> wrote:   Do
you see in the logs whether the TaskManager correctly connect to ZooKeeper as well? They need
this in order to find the JobManager leader.

On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewan22@yahoo.in> wrote:
I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For JobManager HA, I
have started a 3 node zookeeper service on the same swarm network and configured Flink's zookeeper
quorum with zookeeper service instances. 
JobManager gets started with the LeaderElectionService and gets assigned a LeaderSessionID
too, which I can see from the following log statements(attaching only related logs) :
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService 
 org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService.org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService 
- Starting ZooKeeperLeaderRetrievalService.JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager
was granted leadership with leader session ID Some(1f3b2ec6-77b6-4532-928f-ad8befd5202f).
 Trying to associate with JobManager leader akka.tcp://flink@jobmanager:6123/user/jobmanager Resource
Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#590681231]
- leader session 1f3b2ec6-77b6-4532-928f-ad8befd5202f

But TaskManagers are not able to register with the JobManager and gives the following error:
Discard message LeaderSessionMessage(00000000-0000-0000-0000-000000000000,RegisterTaskManager(4fc8aceeae1e27e42b9f16df6c0cf5e3,4fc8aceeae1e27e42b9f16df6c0cf5e3
@ a118cdf39114 (dataPort=43017),cores=1, physMem=1044111360, heap=536870912, managed=324208384,1))
because the expected leader session ID 1f3b2ec6-77b6-4532-928f-ad8befd5202f did not equal
the received leader session ID 00000000-0000-0000-0000-000000000000.

Seems like the ResourceManager was not able to retrieve the LeaderSessionID and passed 00
One interesting thing I observed was a ZK version log:
The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT
will be used instead.

Is this a ZK version problem? Should I be using ZK 3.4.6?
My configuration:
Flink Version : 1.4.0ZK version : 3.4.11 (I just pulled the latest image)
Thanks in advance. 


View raw message