nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy LoPresto <alopre...@apache.org>
Subject Re: Zookeeper - help!
Date Tue, 02 Oct 2018 02:59:50 GMT
Hi Phil,

Nathan’s advice is correct but I think he was assuming all other configurations are correct
as well. Are you trying to run both NiFi nodes and ZK instances on the same machine? In that
case you will have to ensure that the ports in use are different for each service so they
don’t conflict. Setting them all to the same value only works if each service is running
on an independent physical machine, virtual machine, or container.

I find Pierre’s guide [1] to be a helpful step-by-step instruction list as well as a good
explanation of how the clustering concepts work in practice. When you get that working, and
you’re ready to set up a secure cluster, he has a follow-on guide for that as well [2].
Even as someone who has set up many clustered instances of NiFi, I use his guides regularly
to ensure I haven’t forgotten a step.

They were originally written for versions 1.0.0 and 1.1.0, but the only thing that has changed
is the authorizer configuration for the secure instances (you’ll need to put the Initial
Admin Identity and Node Identities in two locations in the authorizers.xml file instead of
just once).

Hopefully this helps you get a working cluster up and running so you can experiment. Good
luck.

[1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
[2] https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 1, 2018, at 2:45 PM, Phil H <gippyphil@gmail.com> wrote:
> 
> Thanks Nathan,
> 
> I changed the protocol.port to 10002 on both servers.
> 
> On server 1, I now just see endless copies of the second error from my original message
(“KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss”) – I don’t
know if that’s normal when there’s only a single member of a cluster alive and running?
 Seems like the logs will fill up very quickly if it is!
> 
> On server 2, I get a bind exception on the Zookeeper client port.  It doesn’t matter
what I set it to (In this example, I changed it to 10500) I always get the same result.  If
I run netstat when nifi isn’t running, there’s nothing listening on the port.  It’s
like NiFi is starting two Zookeeper instances?!  There’s no repeat of this in the start
up sequence though.  Both servers are running completely vanilla 1.6.0 – I don’t even
have any flow defined yet, this is purely for teaching myself clustering config – so I don’t
know why one is behaving differently to the other.
> 
> 2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] o.a.zookeeper.server.ZooKeeperServer
Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ./state/zookeeper/version-2
snapdir ./state/zookeeper/version-2
> 2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] o.apache.zookeeper.server.quorum.Leader
Couldn't bind to nifi2.domain/192.168.10.102:10500
> java.net.BindException: Address already in use (Bind failed)
> 	at java.net.PlainSocketImpl.socketBind(Native Method)
> 	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
> 	at java.net.ServerSocket.bind(ServerSocket.java:375)
> 	at java.net.ServerSocket.bind(ServerSocket.java:329)
> 	at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
> 
> 
> 
> 
> From: Nathan Gough
> Sent: Tuesday, 2 October 2018 2:22 AM
> To: dev@nifi.apache.org
> Subject: Re: Zookeeper - help!
> 
> Hi Phil,
> 
> One thing I notice with your config is that the cluster.node.protol.port and the zookeeper
ports are the same - these should not be the same. Node.protocol.port is used by NiFi cluster
to communicate between nodes, the zookeeper.connect.string port should be the port that zookeeper
service is listening on. The zookeeper port is configured by the clientPort property in the
zookeeper.properties file. This would make your connect string: 'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180',
where 2180 is whatever clientPort is configured.
> 
> You can read more about how NiFi uses Zookeeper and how to configure it here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.
> 
> Let us know what happens once these properties are configured correctly.
> 
> Nathan
> 
> 
> On 9/30/18, 11:07 PM, "Phil H" <gippyphil@gmail.com> wrote:
> 
>    Hi guys,
> 
>    Pulling my hair out trying to solve my Zookeeper problems.  I have two 1.6.0 servers
that I am trying to cluster.
> 
>    Here is the except from the properties files – all other properties are default
so omitted for clarity.   The servers are set up to run HTTPS, and the interface works via
the browser, so I believe the certificates are correctly installed.
> 
>    Server nifi1.domain:
>    nifi.cluster.is.node=true
>    nifi.cluster.node.address=nifi1.domain
>    nifi.cluster.node.protocol.port=10000
> 
>    nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
>    nifi.zookeeper.root.node=/nifi
> 
>    Server nifi2.domain:
>    nifi.cluster.is.node=true
>    nifi.cluster.node.address=nifi2.domain
>    nifi.cluster.node.protocol.port=10000
> 
>    nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
>    nifi.zookeeper.root.node=/nifi
> 
>    I am getting these errors (this is from server 2, but seeing the same on server 1
apart from a different address, of course):
> 
>    2018-10-01 20:54:16,332 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening
for connections from nodes on port 10000
>    2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController Successfully
synchronized controller with proposed flow
>    2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService Connecting
Node: nifi2.domain:443
>    2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] o.a.nifi.security.util.CertificateUtils
The incoming request did not contain client certificates and thus the DN cannot be extracted.
Check that the other endpoint is providing a complete client certificate chain
>    2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener
Failed processing protocol message from nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException:
java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException: peer not
authenticated
>    org.apache.nifi.cluster.protocol.ProtocolException: java.security.cert.CertificateException:
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
>            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
>            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
>            at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
>            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>            at java.lang.Thread.run(Thread.java:748)
>    Caused by: java.security.cert.CertificateException: javax.net.ssl.SSLPeerUnverifiedException:
peer not authenticated
>            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
>            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
>            at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
>            ... 5 common frames omitted
>    Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
>            at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
>            at org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
>            ... 7 common frames omitted
> 
> 
> 
>    2018-10-01 20:54:32,249 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager
State change: SUSPENDED
>    2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl
Background operation retry gave up
>    org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
>            at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>            at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>            at java.lang.Thread.run(Thread.java:748)
> 
> 
> 
> 
> 


Mime
View raw message