nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Re: NiFi fails on cluster nodes
Date Fri, 12 Oct 2018 14:17:06 GMT
It very well could become a problem down the road. The reason ZooKeeper is
usually on a dedicated machine is that you want it to be able to have
enough resources to always communicate within a quorum to reconcile
configuration changes and feed configuration details to clients.

That particular message is just a warning message. From what I can tell,
it's just telling you that no cluster coordinator has been elected and it's
going to try to do something about that. It's usually a problem with
embedded ZooKeeper because each node by default points to the version of
ZooKeeper it fires up.

For a development environment, a VM with 2GB of RAM and 1-2 CPU cores
should be enough to run an external ZooKeeper.

On Fri, Oct 12, 2018 at 9:47 AM Saip, Alexander (NIH/CC/BTRIS) [C] <
alexander.saip@nih.gov> wrote:

> Thanks Mike. We will get an external ZooKeeper instance deployed. I guess
> co-locating it with one of the NiFi nodes shouldn’t be an issue, or will
> it? We are chronically short of hardware. BTW, does the following message
> in the logs point to some sort of problem with the embedded ZooKeeper?
>
>
>
> 2018-10-12 08:21:35,838 WARN [main]
> o.a.nifi.controller.StandardFlowService There is currently no Cluster
> Coordinator. This often happens upon restart of NiFi when running an
> embedded ZooKeeper. Will register this node to become the active Cluster
> Coordinator and will attempt to connect to cluster again
>
> 2018-10-12 08:21:35,838 INFO [main]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> CuratorLeaderElectionManager[stopped=false] Attempted to register Leader
> Election for role 'Cluster Coordinator' but this role is already registered
>
> 2018-10-12 08:21:42,090 INFO [Curator-Framework-0]
> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
>
> 2018-10-12 08:21:42,092 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@17900f5b
> Connection State changed to SUSPENDED
>
>
>
> *From:* Mike Thomsen <mikerthomsen@gmail.com>
> *Sent:* Friday, October 12, 2018 8:33 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi fails on cluster nodes
>
>
>
> Also, in a production environment NiFi should have its own dedicated
> ZooKeeper cluster to be on the safe side. You should not reuse ZooKeeper
> quora (ex. have HBase and NiFi point to the same quorum).
>
>
>
> On Fri, Oct 12, 2018 at 8:29 AM Mike Thomsen <mikerthomsen@gmail.com>
> wrote:
>
> Alexander,
>
>
>
> I am pretty sure your problem is here:
> *nifi.state.management.embedded.zookeeper.start=true*
>
>
>
> That spins up an embedded ZooKeeper, which is generally intended to be
> used for local development. For example, HBase provides the same feature,
> but it is intended to allow you to test a real HBase client application
> against a single node of HBase running locally.
>
>
>
> What you need to try is these steps:
>
>
>
> 1. Set up an external ZooKeeper instance (or set up 3 in a quorum; must be
> odd numbers)
>
> 2. Update nifi.properties on each node to use the external ZooKeeper setup.
>
> 3. Restart all of them.
>
>
>
> See if that works.
>
>
>
> Mike
>
>
>
> On Fri, Oct 12, 2018 at 8:13 AM Saip, Alexander (NIH/CC/BTRIS) [C] <
> alexander.saip@nih.gov> wrote:
>
> *nifi.cluster.node.protocol.port=11443* by default on all nodes, I
> haven’t touched that property. Yesterday, we discovered some issues
> preventing two of the boxes from communicating. Now, they can talk okay.
> Ports 11443, 2181 and 3888 are explicitly open in *iptables*, but
> clustering still doesn’t happen. The log files are filled up with errors
> like this:
>
>
>
> 2018-10-12 07:59:08,494 ERROR [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
>
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
>
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
>
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
>
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>
>         at
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>         at java.lang.Thread.run(Thread.java:748)
>
>
>
> Is there anything else we should check?
>
>
>
> *From:* Nathan Gough <thenatog@gmail.com>
> *Sent:* Thursday, October 11, 2018 9:12 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi fails on cluster nodes
>
>
>
> You may also need to explicitly open ‘nifi.cluster.node.protocol.port’ on
> all nodes to allow cluster communication for cluster heartbeats etc.
>
>
>
> *From: *ashmeet kandhari <ashmeetkandhari93@gmail.com>
> *Reply-To: *<users@nifi.apache.org>
> *Date: *Thursday, October 11, 2018 at 9:09 AM
> *To: *<users@nifi.apache.org>
> *Subject: *Re: NiFi fails on cluster nodes
>
>
>
> Hi Alexander,
>
>
>
> Can you verify by pinging if the 3 nodes (tcp ping) or run nifi in
> standalone mode and see if you can ping them from other 2 servers just to
> be sure if they can communicate with one another.
>
>
>
> On Thu, Oct 11, 2018 at 11:49 AM Saip, Alexander (NIH/CC/BTRIS) [C] <
> alexander.saip@nih.gov> wrote:
>
> How do I do that? The *nifi.properties* file on each node includes ‘
> *nifi.state.management.embedded.zookeeper.start=true’*, so I assume
> Zookeeper does start.
>
>
>
> *From:* ashmeet kandhari <ashmeetkandhari93@gmail.com>
> *Sent:* Thursday, October 11, 2018 4:36 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi fails on cluster nodes
>
>
>
> Can you see if zookeeper node is up and running and can connect to the
> nifi nodes
>
>
>
> On Wed, Oct 10, 2018 at 7:34 PM Saip, Alexander (NIH/CC/BTRIS) [C] <
> alexander.saip@nih.gov> wrote:
>
> Hello,
>
>
>
> We have three NiFi 1.7.1 nodes originally configured as independent
> instances, each on its own server. There is no firewall between them. When
> I tried to build a cluster following instructions here
> <https://mintopsblog.com/2017/11/12/apache-nifi-cluster-configuration/>,
> NiFi failed to start on all of them, despite the fact that I even set *
> nifi.cluster.protocol.is.secure=false* in the *nifi.properties* file on
> each node. Here is the error in the log files:
>
>
>
> 2018-10-10 13:57:07,506 INFO [main] org.apache.nifi.NiFi Launching NiFi...
>
> 2018-10-10 13:57:07,745 INFO [main]
> o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties
> path to be '/opt/nifi-1.7.1/./conf/nifi.properties'
>
> 2018-10-10 13:57:07,748 INFO [main]
> o.a.nifi.properties.NiFiPropertiesLoader Loaded 125 properties from
> /opt/nifi-1.7.1/./conf/nifi.properties
>
> 2018-10-10 13:57:07,755 INFO [main] org.apache.nifi.NiFi Loaded 125
> properties
>
> 2018-10-10 13:57:07,762 INFO [main] org.apache.nifi.BootstrapListener
> Started Bootstrap Listener, Listening for incoming requests on port 43744
>
> 2018-10-10 13:59:15,056 ERROR [main] org.apache.nifi.NiFi Failure to
> launch NiFi due to java.net.ConnectException: Connection timed out
> (Connection timed out)
>
> java.net.ConnectException: Connection timed out (Connection timed out)
>
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>
>         at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>
>         at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>
>         at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>
>         at java.net.Socket.connect(Socket.java:589)
>
>         at java.net.Socket.connect(Socket.java:538)
>
>         at
> org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java:100)
>
>         at
> org.apache.nifi.BootstrapListener.start(BootstrapListener.java:83)
>
>         at org.apache.nifi.NiFi.<init>(NiFi.java:102)
>
>         at org.apache.nifi.NiFi.<init>(NiFi.java:71)
>
>         at org.apache.nifi.NiFi.main(NiFi.java:292)
>
> 2018-10-10 13:59:15,058 INFO [Thread-1] org.apache.nifi.NiFi Initiating
> shutdown of Jetty web server...
>
> 2018-10-10 13:59:15,059 INFO [Thread-1] org.apache.nifi.NiFi Jetty web
> server shutdown completed (nicely or otherwise).
>
>
>
> Without clustering, the instances had no problem starting. Since this is
> our first experiment building a cluster, I’m not sure where to look for
> clues.
>
>
>
> Thanks in advance,
>
>
>
> Alexander
>
>

Mime
View raw message