nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saip, Alexander (NIH/CC/BTRIS) [C]" <alexander.s...@nih.gov>
Subject RE: NiFi fails on cluster nodes
Date Fri, 12 Oct 2018 13:47:36 GMT
Thanks Mike. We will get an external ZooKeeper instance deployed. I guess co-locating it with
one of the NiFi nodes shouldn’t be an issue, or will it? We are chronically short of hardware.
BTW, does the following message in the logs point to some sort of problem with the embedded
ZooKeeper?

2018-10-12 08:21:35,838 WARN [main] o.a.nifi.controller.StandardFlowService There is currently
no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper.
Will register this node to become the active Cluster Coordinator and will attempt to connect
to cluster again
2018-10-12 08:21:35,838 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false]
Attempted to register Leader Election for role 'Cluster Coordinator' but this role is already
registered
2018-10-12 08:21:42,090 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State
change: SUSPENDED
2018-10-12 08:21:42,092 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@17900f5b
Connection State changed to SUSPENDED

From: Mike Thomsen <mikerthomsen@gmail.com>
Sent: Friday, October 12, 2018 8:33 AM
To: users@nifi.apache.org
Subject: Re: NiFi fails on cluster nodes

Also, in a production environment NiFi should have its own dedicated ZooKeeper cluster to
be on the safe side. You should not reuse ZooKeeper quora (ex. have HBase and NiFi point to
the same quorum).

On Fri, Oct 12, 2018 at 8:29 AM Mike Thomsen <mikerthomsen@gmail.com<mailto:mikerthomsen@gmail.com>>
wrote:
Alexander,

I am pretty sure your problem is here: nifi.state.management.embedded.zookeeper.start=true

That spins up an embedded ZooKeeper, which is generally intended to be used for local development.
For example, HBase provides the same feature, but it is intended to allow you to test a real
HBase client application against a single node of HBase running locally.

What you need to try is these steps:

1. Set up an external ZooKeeper instance (or set up 3 in a quorum; must be odd numbers)
2. Update nifi.properties on each node to use the external ZooKeeper setup.
3. Restart all of them.

See if that works.

Mike

On Fri, Oct 12, 2018 at 8:13 AM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.saip@nih.gov<mailto:alexander.saip@nih.gov>>
wrote:
nifi.cluster.node.protocol.port=11443 by default on all nodes, I haven’t touched that property.
Yesterday, we discovered some issues preventing two of the boxes from communicating. Now,
they can talk okay. Ports 11443, 2181 and 3888 are explicitly open in iptables, but clustering
still doesn’t happen. The log files are filled up with errors like this:

2018-10-12 07:59:08,494 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background
operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Is there anything else we should check?

From: Nathan Gough <thenatog@gmail.com<mailto:thenatog@gmail.com>>
Sent: Thursday, October 11, 2018 9:12 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi fails on cluster nodes

You may also need to explicitly open ‘nifi.cluster.node.protocol.port’ on all nodes to
allow cluster communication for cluster heartbeats etc.

From: ashmeet kandhari <ashmeetkandhari93@gmail.com<mailto:ashmeetkandhari93@gmail.com>>
Reply-To: <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Thursday, October 11, 2018 at 9:09 AM
To: <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: NiFi fails on cluster nodes

Hi Alexander,

Can you verify by pinging if the 3 nodes (tcp ping) or run nifi in standalone mode and see
if you can ping them from other 2 servers just to be sure if they can communicate with one
another.

On Thu, Oct 11, 2018 at 11:49 AM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.saip@nih.gov<mailto:alexander.saip@nih.gov>>
wrote:
How do I do that? The nifi.properties file on each node includes ‘nifi.state.management.embedded.zookeeper.start=true’,
so I assume Zookeeper does start.

From: ashmeet kandhari <ashmeetkandhari93@gmail.com<mailto:ashmeetkandhari93@gmail.com>>
Sent: Thursday, October 11, 2018 4:36 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi fails on cluster nodes

Can you see if zookeeper node is up and running and can connect to the nifi nodes

On Wed, Oct 10, 2018 at 7:34 PM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.saip@nih.gov<mailto:alexander.saip@nih.gov>>
wrote:
Hello,

We have three NiFi 1.7.1 nodes originally configured as independent instances, each on its
own server. There is no firewall between them. When I tried to build a cluster following instructions
here<https://mintopsblog.com/2017/11/12/apache-nifi-cluster-configuration/>, NiFi failed
to start on all of them, despite the fact that I even set nifi.cluster.protocol.is.secure=false
in the nifi.properties file on each node. Here is the error in the log files:

2018-10-10 13:57:07,506 INFO [main] org.apache.nifi.NiFi Launching NiFi...
2018-10-10 13:57:07,745 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Determined default
nifi.properties path to be '/opt/nifi-1.7.1/./conf/nifi.properties'
2018-10-10 13:57:07,748 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Loaded 125 properties
from /opt/nifi-1.7.1/./conf/nifi.properties
2018-10-10 13:57:07,755 INFO [main] org.apache.nifi.NiFi Loaded 125 properties
2018-10-10 13:57:07,762 INFO [main] org.apache.nifi.BootstrapListener Started Bootstrap Listener,
Listening for incoming requests on port 43744
2018-10-10 13:59:15,056 ERROR [main] org.apache.nifi.NiFi Failure to launch NiFi due to java.net.ConnectException:
Connection timed out (Connection timed out)
java.net.ConnectException: Connection timed out (Connection timed out)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at java.net.Socket.connect(Socket.java:538)
        at org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java:100)
        at org.apache.nifi.BootstrapListener.start(BootstrapListener.java:83)
        at org.apache.nifi.NiFi.<init>(NiFi.java:102)
        at org.apache.nifi.NiFi.<init>(NiFi.java:71)
        at org.apache.nifi.NiFi.main(NiFi.java:292)
2018-10-10 13:59:15,058 INFO [Thread-1] org.apache.nifi.NiFi Initiating shutdown of Jetty
web server...
2018-10-10 13:59:15,059 INFO [Thread-1] org.apache.nifi.NiFi Jetty web server shutdown completed
(nicely or otherwise).

Without clustering, the instances had no problem starting. Since this is our first experiment
building a cluster, I’m not sure where to look for clues.

Thanks in advance,

Alexander
Mime
View raw message