nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff <jtsw...@gmail.com>
Subject Re: Phantom node
Date Fri, 05 May 2017 05:14:31 GMT
Neil,

Quick question... by "The nodes that are healthy are all able to ping each
other", is one of the nodes not able to ping the other nodes?

On Thu, May 4, 2017 at 11:23 AM Neil Derraugh <
neil.derraugh@intellifylearning.com> wrote:

> Our three zookeepers are external.  The nodes that are healthy are all
> able to ping each other.  nifi.properties follow (with minor mods around
> hostnames/IPs).
>
> # Core Properties #
> nifi.version=1.1.2
> nifi.flow.configuration.file=/mnt/mesos/sandbox/conf/flow.xml.gz
> nifi.flow.configuration.archive.enabled=true
> nifi.flow.configuration.archive.dir=/mnt/mesos/sandbox/conf/archive/
> nifi.flow.configuration.archive.max.time=30 days
> nifi.flow.configuration.archive.max.storage=500 MB
> nifi.flowcontroller.autoResumeState=true
> nifi.flowcontroller.graceful.shutdown.period=10 sec
> nifi.flowservice.writedelay.interval=500 ms
> nifi.administrative.yield.duration=30 sec
> # If a component has no work to do (is "bored"), how long should we wait
> before checking again for work?
> nifi.bored.yield.duration=10 millis
> nifi.authorizer.configuration.file=/mnt/mesos/sandbox/conf/authorizers.xml
>
> nifi.login.identity.provider.configuration.file=/mnt/mesos/sandbox/conf/login-identity-providers.xml
> nifi.templates.directory=/mnt/mesos/sandbox/conf/templates
> nifi.ui.banner.text=master - dev3
> nifi.ui.autorefresh.interval=30 sec
> nifi.nar.library.directory=/opt/nifi/lib
> nifi.nar.library.directory.custom=/mnt/mesos/sandbox/lib
> nifi.nar.working.directory=/mnt/mesos/sandbox/work/nar/
>
> nifi.documentation.working.directory=/mnt/mesos/sandbox/work/docs/components
> ####################
> # State Management #
> ####################
>
> nifi.state.management.configuration.file=/mnt/mesos/sandbox/conf/state-management.xml
> # The ID of the local state provider
> nifi.state.management.provider.local=local-provider
> # The ID of the cluster-wide state provider. This will be ignored if NiFi
> is not clustered but must be populated if running in a cluster.
> nifi.state.management.provider.cluster=zk-provider
> # Specifies whether or not this instance of NiFi should run an embedded
> ZooKeeper server
> nifi.state.management.embedded.zookeeper.start=false
> # Properties file that provides the ZooKeeper properties to use if
> <nifi.state.management.embedded.zookeeper.start> is set to true
>
> nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties
> # H2 Settings
> nifi.database.directory=/mnt/mesos/sandbox/data/database_repository
> nifi.h2.url.append=;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE
> # FlowFile Repository
>
> nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
>
> nifi.flowfile.repository.directory=/mnt/mesos/sandbox/data/flowfile_repository
> nifi.flowfile.repository.partitions=256
> nifi.flowfile.repository.checkpoint.interval=2 mins
> nifi.flowfile.repository.always.sync=false
>
> nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager
> nifi.queue.swap.threshold=20000
> nifi.swap.in.period=5 sec
> nifi.swap.in.threads=1
> nifi.swap.out.period=5 sec
> nifi.swap.out.threads=4
> # Content Repository
>
> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
> nifi.content.claim.max.appendable.size=10 MB
> nifi.content.claim.max.flow.files=100
>
> nifi.content.repository.directory.default=/mnt/mesos/sandbox/data/content_repository
> nifi.content.repository.archive.max.retention.period=12 hours
> nifi.content.repository.archive.max.usage.percentage=50%
> nifi.content.repository.archive.enabled=true
> nifi.content.repository.always.sync=false
> nifi.content.viewer.url=/nifi-content-viewer/
> # Provenance Repository Properties
>
> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
> # Persistent Provenance Repository Properties
>
> nifi.provenance.repository.directory.default=/mnt/mesos/sandbox/data/provenance_repository
> nifi.provenance.repository.max.storage.time=24 hours
> nifi.provenance.repository.max.storage.size=1 GB
> nifi.provenance.repository.rollover.time=30 secs
> nifi.provenance.repository.rollover.size=100 MB
> nifi.provenance.repository.query.threads=2
> nifi.provenance.repository.index.threads=1
> nifi.provenance.repository.compress.on.rollover=true
> nifi.provenance.repository.always.sync=false
> nifi.provenance.repository.journal.count=16
> # Comma-separated list of fields. Fields that are not indexed will not be
> searchable. Valid fields are:
> # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID,
> AlternateIdentifierURI, Relationship, Details
> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID,
> Filename, ProcessorID, Relationship
> # FlowFile Attributes that should be indexed and made searchable.  Some
> examples to consider are filename, uuid, mime.type
> nifi.provenance.repository.indexed.attributes=
> # Large values for the shard size will result in more Java heap usage when
> searching the Provenance Repository
> # but should provide better performance
> nifi.provenance.repository.index.shard.size=500 MB
> # Indicates the maximum length that a FlowFile attribute can be when
> retrieving a Provenance Event from
> # the repository. If the length of any attribute exceeds this value, it
> will be truncated when the event is retrieved.
> nifi.provenance.repository.max.attribute.length=65536
> # Volatile Provenance Respository Properties
> nifi.provenance.repository.buffer.size=100000
> # Component Status Repository
>
> nifi.components.status.repository.implementation=org.apache.nifi.controller.status.history.VolatileComponentStatusRepository
> nifi.components.status.repository.buffer.size=1440
> nifi.components.status.snapshot.frequency=1 min
> # Site to Site properties
> nifi.remote.input.host=w.x.y.z
> nifi.remote.input.secure=false
> nifi.remote.input.socket.port=31310
> nifi.remote.input.http.enabled=true
> nifi.remote.input.http.transaction.ttl=30 sec
> # web properties #
> nifi.web.war.directory=/opt/nifi/lib
> nifi.web.http.host=w.x.y.z
> nifi.web.http.port=31308
> nifi.web.https.host=
> #w.x.y.z
> nifi.web.https.port=
> #31268
> nifi.web.jetty.working.directory=/mnt/mesos/sandbox/work/jetty
> nifi.web.jetty.threads=200
> # security properties #
> nifi.sensitive.props.key=
> nifi.sensitive.props.key.protected=
> nifi.sensitive.props.algorithm=PBEWITHMD5AND256BITAES-CBC-OPENSSL
> nifi.sensitive.props.provider=BC
> nifi.sensitive.props.additional.keys=
> nifi.security.keystore=
> nifi.security.keystoreType=
> nifi.security.keystorePasswd=
> nifi.security.keyPasswd=
> nifi.security.truststore=
> nifi.security.truststoreType=
> nifi.security.truststorePasswd=
> nifi.security.needClientAuth=
> nifi.security.user.authorizer=file-provider
> nifi.security.user.login.identity.provider=
> #ldap-provider
> nifi.security.ocsp.responder.url=
> nifi.security.ocsp.responder.certificate=
> # Identity Mapping Properties #
> # These properties allow normalizing user identities such that identities
> coming from different identity providers
> # (certificates, LDAP, Kerberos) can be treated the same internally in
> NiFi. The following example demonstrates normalizing
> # DNs from certificates and principals from Kerberos into a common
> identity string:
> #
> # nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?), O=(.*?),
> L=(.*?), ST=(.*?), C=(.*?)$
> # nifi.security.identity.mapping.value.dn=$1@$2
> # nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance@(.*?)$
> # nifi.security.identity.mapping.value.kerb=$1@$2
> # cluster common properties (all nodes must have same values) #
> nifi.cluster.protocol.heartbeat.interval=5 sec
> nifi.cluster.protocol.is.secure=false
> # cluster node properties (only configure for cluster nodes) #
> nifi.cluster.is.node=true
> nifi.cluster.node.address=w.x.y.z
> nifi.cluster.node.protocol.port=31267
> nifi.cluster.node.protocol.threads=10
> nifi.cluster.node.event.history.size=25
> nifi.cluster.node.connection.timeout=5 sec
> nifi.cluster.node.read.timeout=5 sec
> nifi.cluster.firewall.file=
> nifi.cluster.flow.election.max.wait.time=1 mins
> nifi.cluster.flow.election.max.candidates=3
> # zookeeper properties, used for cluster management #
> nifi.zookeeper.connect.string=zookeepers.some.uri:2181
> nifi.zookeeper.connect.timeout=3 secs
> nifi.zookeeper.session.timeout=3 secs
> nifi.zookeeper.root.node=/nifi
> # kerberos #
> nifi.kerberos.krb5.file=
> # kerberos service principal #
> nifi.kerberos.service.principal=
> nifi.kerberos.service.keytab.location=
> # kerberos spnego principal #
> nifi.kerberos.spnego.principal=
> nifi.kerberos.spnego.keytab.location=
> nifi.kerberos.spnego.authentication.expiration=12 hours
> # external properties files for variable registry
> # supports a comma delimited list of file locations
> nifi.variable.registry.properties=
> # Build info
> nifi.build.tag=nifi-1.1.2-RC1
> nifi.build.branch=NIFI-3486-RC1
> nifi.build.revision=744cfe6
> nifi.build.timestamp=2017-02-16T01:48:27Z
>
> On Wed, May 3, 2017 at 9:27 PM, Jeff <jtswork@gmail.com> wrote:
>
>> Can you provide some information on the configuration (nifi.properties)
>> of the nodes in your cluster?  Can each node in your cluster ping all the
>> other nodes?  Are you running embedded ZooKeeper, or an external one?
>>
>> On Wed, May 3, 2017 at 8:11 PM Neil Derraugh <
>> neil.derraugh@intellifylearning.com> wrote:
>>
>>> I can't load the canvas right now on our cluster.  I get this error from
>>> one of the nodes nifi-app.logs
>>>
>>> 2017-05-03 23:40:30,207 WARN [Replicate Request Thread-2]
>>> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET
>>> /nifi-api/flow/current-user to 10.80.53.39:31212 due to {}
>>> com.sun.jersey.api.client.ClientHandlerException: java.net.NoRouteToHostException:
>>> Host is unreachable (Host unreachable)
>>> at
>>> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at com.sun.jersey.api.client.Client.handle(Client.java:652)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at
>>> com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at
>>> com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at
>>> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:579)
>>> ~[nifi-framework-cluster-1.1.2.jar:1.1.2]
>>> at
>>> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:771)
>>> ~[nifi-framework-cluster-1.1.2.jar:1.1.2]
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> [na:1.8.0_121]
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> [na:1.8.0_121]
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> [na:1.8.0_121]
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_121]
>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
>>> Caused by: java.net.NoRouteToHostException: Host is unreachable (Host
>>> unreachable)
>>> at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_121]
>>> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>> ~[na:1.8.0_121]
>>> at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>> ~[na:1.8.0_121]
>>> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>> ~[na:1.8.0_121]
>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>> ~[na:1.8.0_121]
>>> at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_121]
>>> at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
>>> ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>>> ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>>> ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
>>> ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.New(HttpClient.java:308) ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.New(HttpClient.java:326) ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
>>> ~[na:1.8.0_121]
>>> at
>>> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
>>> ~[na:1.8.0_121]
>>> at
>>> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at
>>> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
>>> ~[jersey-client-1.19.jar:1.19]
>>> ... 12 common frames omitted
>>> 2017-05-03 23:40:30,207 WARN [Replicate Request Thread-2]
>>> o.a.n.c.c.h.r.ThreadPoolRequestReplicator
>>> com.sun.jersey.api.client.ClientHandlerException: java.net.NoRouteToHostException:
>>> Host is unreachable (Host unreachable)
>>> at
>>> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at com.sun.jersey.api.client.Client.handle(Client.java:652)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at
>>> com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at
>>> com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at
>>> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:579)
>>> ~[nifi-framework-cluster-1.1.2.jar:1.1.2]
>>> at
>>> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:771)
>>> ~[nifi-framework-cluster-1.1.2.jar:1.1.2]
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> [na:1.8.0_121]
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> [na:1.8.0_121]
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> [na:1.8.0_121]
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_121]
>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
>>> Caused by: java.net.NoRouteToHostException: Host is unreachable (Host
>>> unreachable)
>>> at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_121]
>>> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>> ~[na:1.8.0_121]
>>> at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>> ~[na:1.8.0_121]
>>> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>> ~[na:1.8.0_121]
>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>> ~[na:1.8.0_121]
>>> at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_121]
>>> at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
>>> ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>>> ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>>> ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
>>> ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.New(HttpClient.java:308) ~[na:1.8.0_121]
>>> at sun.net.www.http.HttpClient.New(HttpClient.java:326) ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
>>> ~[na:1.8.0_121]
>>> at
>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
>>> ~[na:1.8.0_121]
>>> at
>>> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
>>> ~[na:1.8.0_121]
>>> at
>>> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
>>> ~[jersey-client-1.19.jar:1.19]
>>> at
>>> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
>>> ~[jersey-client-1.19.jar:1.19]
>>> ... 12 common frames omitted
>>>
>>>
>>> GETing the nifi-api's controller/cluster I get the list of happy nodes
>>> and then this one (IP address changed to protect the innocent):
>>> {
>>> nodeId: "7f7c1a9e-faa6-413b-9317-bcec4996cb14",
>>> address: "w.x.y.z",
>>> apiPort: 31212,
>>> status: "CONNECTED",
>>> roles: [ ],
>>> events: [ ]
>>> }
>>>
>>> No events, no roles.  It says it's connected but has no heartbeat and
>>> it's not part of the list of running jobs so far as I can detect.  It's
>>> likely a node that had previously been a health member of the cluster.
>>>
>>> Can anybody help me interpret this?
>>>
>>> I deleted the node and am carrying on as usual.  Just wondering if
>>> anyone has any insight into why it would leave the node in the cluster and
>>> show it as connected.
>>>
>>> Thanks,
>>> Neil
>>>
>>
>

Mime
View raw message