nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: NiFi fails on cluster nodes
Date Mon, 15 Oct 2018 14:27:15 GMT
I'm not really sure, the error message is indicating that either a
certificate was not sent during cluster communications, or possibly
the cert was not valid/trusted.

In this case since it is only 1 node, it is the same node talking back
to itself, so the only parts involved here are the keystore and
truststore of that node, and the config in nifi.properties.

Maybe your truststore is not setup correctly to trust certs signed by
the CA that created the server cert?
On Mon, Oct 15, 2018 at 9:53 AM Saip, Alexander (NIH/CC/BTRIS) [C]
<alexander.saip@nih.gov> wrote:
>
> Yes, 'nifi.cluster.protocol.is.secure' is set to 'true', since otherwise, NiFi would
require values for 'nifi.web.http.host' and 'nifi.web.http.port'. We have a cert that is used
to serve HTTPS requests to the NiFi web UI, and it works just fine.
>
> -----Original Message-----
> From: Bryan Bende <bbende@gmail.com>
> Sent: Monday, October 15, 2018 9:43 AM
> To: users@nifi.apache.org
> Subject: Re: NiFi fails on cluster nodes
>
> This is not related to ZooKeeper... I think you are missing something related to TLS/SSL
configuration, maybe you set cluster protocol to be secure, but then you didn't configure
NiFi with a keystore/truststore?
>
> On Mon, Oct 15, 2018 at 9:41 AM Mike Thomsen <mikerthomsen@gmail.com> wrote:
> >
> > Not sure what's going on here, but NiFi does not require a cert to setup ZooKeeper.
> >
> > Mike
> >
> > On Mon, Oct 15, 2018 at 9:39 AM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.saip@nih.gov>
wrote:
> >>
> >> Hi Mike and Bryan,
> >>
> >>
> >>
> >> I’ve installed and started ZooKeeper 3.4.13 and re-started a single NiFi node
so far. Here is the error from the NiFi log:
> >>
> >>
> >>
> >> 2018-10-15 09:19:48,371 ERROR [Process Cluster Protocol Request-1]
> >> o.a.nifi.security.util.CertificateUtils The incoming request did not
> >> contain client certificates and thus the DN cannot be extracted.
> >> Check that the other endpoint is providing a complete client
> >> certificate chain
> >>
> >> 2018-10-15 09:19:48,425 INFO [main]
> >> o.a.nifi.controller.StandardFlowService Connecting Node: 0.0.0.0:8008
> >>
> >> 2018-10-15 09:19:48,452 ERROR [Process Cluster Protocol Request-2]
> >> o.a.nifi.security.util.CertificateUtils The incoming request did not
> >> contain client certificates and thus the DN cannot be extracted.
> >> Check that the other endpoint is providing a complete client
> >> certificate chain
> >>
> >> 2018-10-15 09:19:48,456 WARN [main]
> >> o.a.nifi.controller.StandardFlowService Failed to connect to cluster
> >> due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed
> >> marshalling 'CONNECTION_REQUEST' protocol message due to:
> >> javax.net.ssl.SSLHandshakeException: Received fatal alert:
> >> bad_certificate
> >>
> >>
> >>
> >> It is likely extraneous to NiFi, but does this mean that we need install a cert
into ZooKeeper? Right now, both apps are running on the same box.
> >>
> >>
> >>
> >> Thank you.
> >>
> >>
> >>
> >> From: Mike Thomsen <mikerthomsen@gmail.com>
> >> Sent: Monday, October 15, 2018 9:02 AM
> >> To: users@nifi.apache.org
> >> Subject: Re: NiFi fails on cluster nodes
> >>
> >>
> >>
> >> http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
> >>
> >>
> >>
> >> See the properties that start with "nifi.zookeeper."
> >>
> >>
> >>
> >> On Mon, Oct 15, 2018 at 8:58 AM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.saip@nih.gov>
wrote:
> >>
> >> Mike,
> >>
> >>
> >>
> >> I wonder if you could point me to instructions how to configure a cluster with
an external instance of ZooKeeper? The NiFi Admin Guide talks exclusively about the embedded
one.
> >>
> >>
> >>
> >> Thanks again.
> >>
> >>
> >>
> >> From: Mike Thomsen <mikerthomsen@gmail.com>
> >> Sent: Friday, October 12, 2018 10:17 AM
> >> To: users@nifi.apache.org
> >> Subject: Re: NiFi fails on cluster nodes
> >>
> >>
> >>
> >> It very well could become a problem down the road. The reason ZooKeeper is usually
on a dedicated machine is that you want it to be able to have enough resources to always communicate
within a quorum to reconcile configuration changes and feed configuration details to clients.
> >>
> >>
> >>
> >> That particular message is just a warning message. From what I can tell, it's
just telling you that no cluster coordinator has been elected and it's going to try to do
something about that. It's usually a problem with embedded ZooKeeper because each node by
default points to the version of ZooKeeper it fires up.
> >>
> >>
> >>
> >> For a development environment, a VM with 2GB of RAM and 1-2 CPU cores should
be enough to run an external ZooKeeper.
> >>
> >>
> >>
> >> On Fri, Oct 12, 2018 at 9:47 AM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.saip@nih.gov>
wrote:
> >>
> >> Thanks Mike. We will get an external ZooKeeper instance deployed. I guess co-locating
it with one of the NiFi nodes shouldn’t be an issue, or will it? We are chronically short
of hardware. BTW, does the following message in the logs point to some sort of problem with
the embedded ZooKeeper?
> >>
> >>
> >>
> >> 2018-10-12 08:21:35,838 WARN [main]
> >> o.a.nifi.controller.StandardFlowService There is currently no Cluster
> >> Coordinator. This often happens upon restart of NiFi when running an
> >> embedded ZooKeeper. Will register this node to become the active
> >> Cluster Coordinator and will attempt to connect to cluster again
> >>
> >> 2018-10-12 08:21:35,838 INFO [main]
> >> o.a.n.c.l.e.CuratorLeaderElectionManager
> >> CuratorLeaderElectionManager[stopped=false] Attempted to register
> >> Leader Election for role 'Cluster Coordinator' but this role is
> >> already registered
> >>
> >> 2018-10-12 08:21:42,090 INFO [Curator-Framework-0]
> >> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
> >>
> >> 2018-10-12 08:21:42,092 INFO [Curator-ConnectionStateManager-0]
> >> o.a.n.c.l.e.CuratorLeaderElectionManager
> >> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManag
> >> er$ElectionListener@17900f5b Connection State changed to SUSPENDED
> >>
> >>
> >>
> >> From: Mike Thomsen <mikerthomsen@gmail.com>
> >> Sent: Friday, October 12, 2018 8:33 AM
> >> To: users@nifi.apache.org
> >> Subject: Re: NiFi fails on cluster nodes
> >>
> >>
> >>
> >> Also, in a production environment NiFi should have its own dedicated ZooKeeper
cluster to be on the safe side. You should not reuse ZooKeeper quora (ex. have HBase and NiFi
point to the same quorum).
> >>
> >>
> >>
> >> On Fri, Oct 12, 2018 at 8:29 AM Mike Thomsen <mikerthomsen@gmail.com>
wrote:
> >>
> >> Alexander,
> >>
> >>
> >>
> >> I am pretty sure your problem is here:
> >> nifi.state.management.embedded.zookeeper.start=true
> >>
> >>
> >>
> >> That spins up an embedded ZooKeeper, which is generally intended to be used
for local development. For example, HBase provides the same feature, but it is intended to
allow you to test a real HBase client application against a single node of HBase running locally.
> >>
> >>
> >>
> >> What you need to try is these steps:
> >>
> >>
> >>
> >> 1. Set up an external ZooKeeper instance (or set up 3 in a quorum;
> >> must be odd numbers)
> >>
> >> 2. Update nifi.properties on each node to use the external ZooKeeper setup.
> >>
> >> 3. Restart all of them.
> >>
> >>
> >>
> >> See if that works.
> >>
> >>
> >>
> >> Mike
> >>
> >>
> >>
> >> On Fri, Oct 12, 2018 at 8:13 AM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.saip@nih.gov>
wrote:
> >>
> >> nifi.cluster.node.protocol.port=11443 by default on all nodes, I haven’t touched
that property. Yesterday, we discovered some issues preventing two of the boxes from communicating.
Now, they can talk okay. Ports 11443, 2181 and 3888 are explicitly open in iptables, but clustering
still doesn’t happen. The log files are filled up with errors like this:
> >>
> >>
> >>
> >> 2018-10-12 07:59:08,494 ERROR [Curator-Framework-0]
> >> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
> >>
> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >> KeeperErrorCode = ConnectionLoss
> >>
> >>         at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >>
> >>         at
> >> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroun
> >> dRetry(CuratorFrameworkImpl.java:728)
> >>
> >>         at
> >> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgro
> >> undOperation(CuratorFrameworkImpl.java:857)
> >>
> >>         at
> >> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOper
> >> ationsLoop(CuratorFrameworkImpl.java:809)
> >>
> >>         at
> >> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(Cur
> >> atorFrameworkImpl.java:64)
> >>
> >>         at
> >> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(Curator
> >> FrameworkImpl.java:267)
> >>
> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >>
> >>         at
> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
> >> access$201(ScheduledThreadPoolExecutor.java:180)
> >>
> >>         at
> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
> >> run(ScheduledThreadPoolExecutor.java:293)
> >>
> >>         at
> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> >> java:1149)
> >>
> >>         at
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> >> .java:624)
> >>
> >>         at java.lang.Thread.run(Thread.java:748)
> >>
> >>
> >>
> >> Is there anything else we should check?
> >>
> >>
> >>
> >> From: Nathan Gough <thenatog@gmail.com>
> >> Sent: Thursday, October 11, 2018 9:12 AM
> >> To: users@nifi.apache.org
> >> Subject: Re: NiFi fails on cluster nodes
> >>
> >>
> >>
> >> You may also need to explicitly open ‘nifi.cluster.node.protocol.port’ on
all nodes to allow cluster communication for cluster heartbeats etc.
> >>
> >>
> >>
> >> From: ashmeet kandhari <ashmeetkandhari93@gmail.com>
> >> Reply-To: <users@nifi.apache.org>
> >> Date: Thursday, October 11, 2018 at 9:09 AM
> >> To: <users@nifi.apache.org>
> >> Subject: Re: NiFi fails on cluster nodes
> >>
> >>
> >>
> >> Hi Alexander,
> >>
> >>
> >>
> >> Can you verify by pinging if the 3 nodes (tcp ping) or run nifi in standalone
mode and see if you can ping them from other 2 servers just to be sure if they can communicate
with one another.
> >>
> >>
> >>
> >> On Thu, Oct 11, 2018 at 11:49 AM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.saip@nih.gov>
wrote:
> >>
> >> How do I do that? The nifi.properties file on each node includes ‘nifi.state.management.embedded.zookeeper.start=true’,
so I assume Zookeeper does start.
> >>
> >>
> >>
> >> From: ashmeet kandhari <ashmeetkandhari93@gmail.com>
> >> Sent: Thursday, October 11, 2018 4:36 AM
> >> To: users@nifi.apache.org
> >> Subject: Re: NiFi fails on cluster nodes
> >>
> >>
> >>
> >> Can you see if zookeeper node is up and running and can connect to
> >> the nifi nodes
> >>
> >>
> >>
> >> On Wed, Oct 10, 2018 at 7:34 PM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.saip@nih.gov>
wrote:
> >>
> >> Hello,
> >>
> >>
> >>
> >> We have three NiFi 1.7.1 nodes originally configured as independent instances,
each on its own server. There is no firewall between them. When I tried to build a cluster
following instructions here, NiFi failed to start on all of them, despite the fact that I
even set nifi.cluster.protocol.is.secure=false in the nifi.properties file on each node. Here
is the error in the log files:
> >>
> >>
> >>
> >> 2018-10-10 13:57:07,506 INFO [main] org.apache.nifi.NiFi Launching NiFi...
> >>
> >> 2018-10-10 13:57:07,745 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader
Determined default nifi.properties path to be '/opt/nifi-1.7.1/./conf/nifi.properties'
> >>
> >> 2018-10-10 13:57:07,748 INFO [main]
> >> o.a.nifi.properties.NiFiPropertiesLoader Loaded 125 properties from
> >> /opt/nifi-1.7.1/./conf/nifi.properties
> >>
> >> 2018-10-10 13:57:07,755 INFO [main] org.apache.nifi.NiFi Loaded 125
> >> properties
> >>
> >> 2018-10-10 13:57:07,762 INFO [main] org.apache.nifi.BootstrapListener
> >> Started Bootstrap Listener, Listening for incoming requests on port
> >> 43744
> >>
> >> 2018-10-10 13:59:15,056 ERROR [main] org.apache.nifi.NiFi Failure to
> >> launch NiFi due to java.net.ConnectException: Connection timed out
> >> (Connection timed out)
> >>
> >> java.net.ConnectException: Connection timed out (Connection timed
> >> out)
> >>
> >>         at java.net.PlainSocketImpl.socketConnect(Native Method)
> >>
> >>         at
> >> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.ja
> >> va:350)
> >>
> >>         at
> >> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocket
> >> Impl.java:206)
> >>
> >>         at
> >> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java
> >> :188)
> >>
> >>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> >>
> >>         at java.net.Socket.connect(Socket.java:589)
> >>
> >>         at java.net.Socket.connect(Socket.java:538)
> >>
> >>         at
> >> org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java:
> >> 100)
> >>
> >>         at
> >> org.apache.nifi.BootstrapListener.start(BootstrapListener.java:83)
> >>
> >>         at org.apache.nifi.NiFi.<init>(NiFi.java:102)
> >>
> >>         at org.apache.nifi.NiFi.<init>(NiFi.java:71)
> >>
> >>         at org.apache.nifi.NiFi.main(NiFi.java:292)
> >>
> >> 2018-10-10 13:59:15,058 INFO [Thread-1] org.apache.nifi.NiFi Initiating shutdown
of Jetty web server...
> >>
> >> 2018-10-10 13:59:15,059 INFO [Thread-1] org.apache.nifi.NiFi Jetty web server
shutdown completed (nicely or otherwise).
> >>
> >>
> >>
> >> Without clustering, the instances had no problem starting. Since this is our
first experiment building a cluster, I’m not sure where to look for clues.
> >>
> >>
> >>
> >> Thanks in advance,
> >>
> >>
> >>
> >> Alexander

Mime
View raw message