flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Deploying Flink with JobManager HA on Docker Swarm/Kubernetes
Date Thu, 15 Feb 2018 09:09:40 GMT
Hi,

AFAIK, the JobGraph itself is not stored in ZK but in HDFS. ZK only stores a handle to the
serialised JobGraph.

Best,
Aljoscha

> On 15. Feb 2018, at 04:59, Chirag Dewan <chirag.dewan22@yahoo.in> wrote:
> 
> Thanks a lot Aljoscha.
> 
> I was doing a silly mistake. TaskManagers can now register with JobManager.
> 
> One more thing, does Flink now store Job Graphs on ZK too?
> 
> Regards,
> 
> Chirag
> 
> On Wednesday, 14 February, 2018, 8:06:14 PM IST, Aljoscha Krettek <aljoscha@apache.org>
wrote:
> 
> 
> It should be roughly the same settings that you use in your JobManager. They are described
here: https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#zookeeper-based-ha-mode
<https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#zookeeper-based-ha-mode>
> 
>> On 14. Feb 2018, at 15:32, Chirag Dewan <chirag.dewan22@yahoo.in <mailto:chirag.dewan22@yahoo.in>>
wrote:
>> 
>> Thanks Aljoscha.
>> 
>> I haven't checked that bit. Is there any configuration for TaskManagers to find ZK?
>> 
>> Regards,
>> 
>> Chirag
>> 
>> Sent from Yahoo Mail on Android <https://overview.mail.yahoo.com/mobile/?.src=Android>
>> On Wed, 14 Feb 2018 at 7:43 PM, Aljoscha Krettek
>> <aljoscha@apache.org <mailto:aljoscha@apache.org>> wrote:
>> Do you see in the logs whether the TaskManager correctly connect to ZooKeeper as
well? They need this in order to find the JobManager leader.
>> 
>> Best,
>> Aljoscha
>> 
>>> On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewan22@yahoo.in <mailto:chirag.dewan22@yahoo.in>>
wrote:
>>> 
>>> Hi,
>>> 
>>> I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For JobManager
HA, I have started a 3 node zookeeper service on the same swarm network and configured Flink's
zookeeper quorum with zookeeper service instances. 
>>> 
>>> JobManager gets started with the LeaderElectionService and gets assigned a LeaderSessionID
too, which I can see from the following log statements(attaching only related logs) :
>>> 
>>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting
ZooKeeperLeaderElectionService   org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
 - Starting ZooKeeperLeaderRetrievalService.
>>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting
ZooKeeperLeaderRetrievalService.
>>> JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager <> was granted
leadership with leader session ID Some(1f3b2ec6-77b6-4532-928f-ad8befd5202f).
>>>  Trying to associate with JobManager leader akka.tcp://flink@jobmanager:6123/user/jobmanager
<>
>>>  Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#590681231
<>] - leader session 1f3b2ec6-77b6-4532-928f-ad8befd5202f
>>> 
>>> 
>>> But TaskManagers are not able to register with the JobManager and gives the following
error:
>>> 
>>> Discard message LeaderSessionMessage(00000000-0000-0000-0000-000000000000,RegisterTaskManager(4fc8aceeae1e27e42b9f16df6c0cf5e3,4fc8aceeae1e27e42b9f16df6c0cf5e3
@ a118cdf39114 (dataPort=43017),cores=1, physMem=1044111360, heap=536870912, managed=324208384,1))
because the expected leader session ID 1f3b2ec6-77b6-4532-928f-ad8befd5202f did not equal
the received leader session ID 00000000-0000-0000-0000-000000000000.
>>> 
>>> Seems like the ResourceManager was not able to retrieve the LeaderSessionID and
passed 00 ID. 
>>> 
>>> One interesting thing I observed was a ZK version log:
>>> 
>>> The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT
will be used instead.
>>> 
>>> Is this a ZK version problem? Should I be using ZK 3.4.6?
>>> 
>>> My configuration:
>>> 
>>> Flink Version : 1.4.0
>>> ZK version : 3.4.11 (I just pulled the latest image)
>>> 
>>> Thanks in advance. 
>>> 
>>> Chirag
>>> 
>> 
> 


Mime
View raw message