nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Clarke <matt.clarke....@gmail.com>
Subject Re: Nifi cluster features - Questions
Date Mon, 11 Jan 2016 21:47:15 GMT
Chakri,
            What Mark is saying is NiFI Remote Process Group (RPG) also
known as Site-to-Site will load-balance delivery data to all nodes in a
cluster.  It can not be configured to balance data to only a subset of a
nodes in a cluster.  If this is the strategy you want to deploy, a
different approach must be taken (one that does not use Site-to-Site).
Here is a NiFI diagram of one such approach using your example of a 10 node
cluster:

[image: Inline image 1]



On Mon, Jan 11, 2016 at 4:16 PM, Chakrader Dewaragatla <
Chakrader.Dewaragatla@lifelock.com> wrote:

> Mark - Correct me if I understood right.
>
> Curl post from some application —> Configure Listen http (on primary node)
> --> Post http with Data flow file (On primary node?)  --> Post to
> site-to-site end point —> This intern distribute load to both slaves.
>
> Thanks,
> -Chakri
>
> From: Mark Payne <markap14@hotmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Monday, January 11, 2016 at 12:29 PM
>
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> Chakri,
>
> At this time, your only options are to run the processors on all nodes or
> a single node (Primary Node). There's no way to really group nodes together
> and say "only run on this set of nodes."
>
> One option is to have a ListenHTTP Processor and then push data to that
> NiFi via PostHTTP (configure it to send FlowFile attributes along). By
> doing this, you could set up the sending NiFi
> to only deliver data to two nodes. You could then have a different set of
> data going to a different two nodes, etc. by the way that you configure
> which data goes to which PostHTTP Processor.
>
> Does this give you what you need?
>
>
> On Jan 11, 2016, at 3:20 PM, Chakrader Dewaragatla <
> Chakrader.Dewaragatla@lifelock.com> wrote:
>
> Thanks Mark. I will look into it.
>
> Couple of questions:
>
>
>    - Going back to my earlier question, In a nifi cluster with two slaves
>    and NCM how do I make two slaves accept and process the incoming flowfile
>    in distibuted fashion. Site to site is the only way to go ?
>    In our use case, we have http listener running on primary node and
>    putfile processor should run on two slaves in distributed fashion.
>
>    It is more like a new (or existing) feature.
>     - In a nifi cluster setup, can we group the machines and set
>    site-to-site to individual group.
>     For instance I have 10 node cluster, can I group them into 5 groups
>    with two nodes each. Run processors on dedicated group (using site to site
>    or other means).
>
> Thanks,
> -Chakri
>
> From: Mark Payne <markap14@hotmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Monday, January 11, 2016 at 5:24 AM
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> Chakri,
>
> This line in the logs is particularly interesting (on primary node):
>
> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
>
>
> This indicates that all of the site-to-site data will go to the host
> i-c894e249.dev.aws.lifelock.ad. Moreover, because that is the only node
> listed, this means
> that the NCM responded, indicating that this is the only node in the
> cluster that is currently connected and has site-to-site enabled. Can you
> double-check the nifi.properties
> file on the Primary Node and verify that the "
> nifi.remote.input.socket.port" is property is specified, and that the "
> nifi.remote.input.secure" property is set to "false"?
> Of note is that if the "nifi.remote.input.secure" property is set to
> true, but keystore and truststore are not specified, then site-to-site will
> be disabled (there would be a warning
> in the log in this case).
>
> If you can verify that both of those properties are set properly on both
> nodes, then we can delve in further, but probably best to start by
> double-checking the easy things :)
>
> Thanks
> -Mark
>
>
> On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <
> Chakrader.Dewaragatla@lifelock.com> wrote:
>
> Bryan – Here are the logs :
> I have 5 sec flow file.
>
> On primary node (No data coming in)
>
> 2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
> 2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2]
> o.a.nifi.controller.StandardFlowService Saved flow controller
> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
> pending = false
> 2016-01-10 22:52:39,450 INFO [pool-26-thread-16]
> o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled
> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1
> threads
> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
> 2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2]
> o.a.nifi.controller.StandardFlowService Saved flow controller
> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
> pending = false
> 2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
> 2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7]
> o.a.nifi.remote.StandardRemoteGroupPort
> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
> Successfully sent
> [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
> section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]]
> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
> milliseconds at a rate of 392 bytes/sec
> 2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1]
> o.a.nifi.remote.StandardRemoteGroupPort
> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
> Successfully sent
> [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
> section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]]
> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51
> milliseconds at a rate of 391 bytes/sec
> 2016-01-10 22:52:45,092 INFO [Process NCM Request-2]
> o.a.n.c.p.impl.SocketProtocolListener Received request
> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
> o.a.nifi.controller.StandardFlowService Received flow request message from
> manager.
> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes)
> in 61 millis
> 2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
> 2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3]
> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
> for events starting with ID 301
> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
> (6 records) into single Provenance Log File
> ./provenance_repository/295.prov in 111 milliseconds
> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
> Event file containing 8 records
> 2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10]
> o.a.nifi.remote.StandardRemoteGroupPort
> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
> Successfully sent
> [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
> section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]]
> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
> milliseconds at a rate of 392 bytes/sec
> 2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
> 2016-01-10 22:52:54,326 INFO [NiFi Web Server-22]
> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
> StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa]
> to run
> 2016-01-10 22:52:54,353 INFO [NiFi Web Server-26]
> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
> 2016-01-10 22:52:54,377 INFO [NiFi Web Server-25]
> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
> 2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
> 2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2]
> o.a.nifi.controller.StandardFlowService Saved flow controller
> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
> pending = false
> 2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis
>
>
> On Secondary node (Data coming in)
>
> 2016-01-10 22:52:43,896 INFO [pool-18-thread-1]
> o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile
> Repository with 0 records in 88 milliseconds
> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3]
> o.a.n.r.p.s.SocketFlowFileServerProtocol
> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
> Successfully received
> [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]]
> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
> at a rate of 387 bytes/sec
> 2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1]
> o.a.nifi.processors.standard.PutFile
> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
> StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]
> at location /root/putt/275243509297560
> 2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3]
> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
> for events starting with ID 17037
> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
> (6 records) into single Provenance Log File
> ./provenance_repository/17031.prov in 56 milliseconds
> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
> Event file containing 10 records
> 2016-01-10 22:52:45,034 INFO [Process NCM Request-2]
> o.a.n.c.p.impl.SocketProtocolListener Received request
> e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
> o.a.nifi.controller.StandardFlowService Received flow request message from
> manager.
> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
> e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes)
> in 76 millis
> 2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
> 2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6]
> o.a.n.r.p.s.SocketFlowFileServerProtocol
> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
> Successfully received
> [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]]
> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
> at a rate of 387 bytes/sec
> 2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8]
> o.a.nifi.processors.standard.PutFile
> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
> StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]
> at location /root/putt/275248510432074
> 2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
> From: Bryan Bende <bbende@gmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Sunday, January 10, 2016 at 2:43 PM
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> Chakri,
>
> Glad you got site-to-site working.
>
> Regarding the data distribution, I'm not sure why it is behaving that way.
> I just did a similar test running ncm, node1, and node2 all on my local
> machine, with GenerateFlowFile running every 10 seconds, and Input Port
> going to a LogAttribute, and I see it alternating between node1 and node2
> logs every 10 seconds.
>
> Is there anything in your primary node logs
> (primary_node/logs/nifi-app.log) when you see the data on the other node?
>
> -Bryan
>
>
> On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <joe.witt@gmail.com> wrote:
>
>> Chakri,
>>
>> Would love to hear what you've learned and how that differed from the
>> docs themselves.  Site-to-site has proven difficult to setup so we're
>> clearly not there yet in having the right operator/admin experience.
>>
>> Thanks
>> Joe
>>
>> On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
>> <Chakrader.Dewaragatla@lifelock.com> wrote:
>> > I was able to get site-to-site work.
>> > I tried to follow your instructions to send data distribute across the
>> > nodes.
>> >
>> > GenerateFlowFile (On Primary) —> RPG
>> > RPG —> Input Port   —> Putfile (Time driven scheduling)
>> >
>> > However, data is only written to one slave (Secondary slave). Primary
>> slave
>> > has not data.
>> >
>> > Image screenshot :
>> > http://tinyurl.com/jjvjtmq
>> >
>> > From: Chakrader Dewaragatla <chakrader.dewaragatla@lifelock.com>
>> > Date: Sunday, January 10, 2016 at 11:26 AM
>> >
>> > To: "users@nifi.apache.org" <users@nifi.apache.org>
>> > Subject: Re: Nifi cluster features - Questions
>> >
>> > Bryan – Thanks – I am trying to setup site-to-site.
>> > I have two slaves and one NCM.
>> >
>> > My properties as follows :
>> >
>> > On both Slaves:
>> >
>> > nifi.remote.input.socket.port=10880
>> > nifi.remote.input.secure=false
>> >
>> > On NCM:
>> > nifi.remote.input.socket.port=10880
>> > nifi.remote.input.secure=false
>> >
>> > When I try drop remote process group (with http://<NCM IP>:8080/nifi),
>> I see
>> > error as follows for two nodes.
>> >
>> > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
>> > communication
>> > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
>> > communication
>> >
>> > Do you have insight why its trying to connecting 8080 on slaves ? When
>> do
>> > 10880 port come into the picture ? I remember try setting site to site
>> few
>> > months back and succeeded.
>> >
>> > Thanks,
>> > -Chakri
>> >
>> >
>> >
>> > From: Bryan Bende <bbende@gmail.com>
>> > Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>> > Date: Saturday, January 9, 2016 at 11:22 AM
>> > To: "users@nifi.apache.org" <users@nifi.apache.org>
>> > Subject: Re: Nifi cluster features - Questions
>> >
>> > The sending node (where the remote process group is) will distribute the
>> > data evenly across the two nodes, so an individual file will only be
>> sent to
>> > one of the nodes. You could think of it as if a separate NiFi instance
>> was
>> > sending directly to a two node cluster, it would be evenly distributing
>> the
>> > data across the two nodes. In this case it just so happens to all be
>> with in
>> > the same cluster.
>> >
>> > The most common use case for this scenario is the List and Fetch
>> processors
>> > like HDFS. You can perform the listing on primary node, and then
>> distribute
>> > the results so the fetching takes place on all nodes.
>> >
>> > On Saturday, January 9, 2016, Chakrader Dewaragatla
>> > <Chakrader.Dewaragatla@lifelock.com> wrote:
>> >>
>> >> Bryan – Thanks, how do the nodes distribute the load for a input port.
>> As
>> >> port is open and listening on two nodes,  does it copy same files on
>> both
>> >> the nodes?
>> >> I need to try this setup to see the results, appreciate your help.
>> >>
>> >> Thanks,
>> >> -Chakri
>> >>
>> >> From: Bryan Bende <bbende@gmail.com>
>> >> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>> >> Date: Friday, January 8, 2016 at 3:44 PM
>> >> To: "users@nifi.apache.org" <users@nifi.apache.org>
>> >> Subject: Re: Nifi cluster features - Questions
>> >>
>> >> Hi Chakri,
>> >>
>> >> I believe the DistributeLoad processor is more for load balancing when
>> >> sending to downstream systems. For example, if you had two HTTP
>> endpoints,
>> >> you could have the first relationship from DistributeLoad going to a
>> >> PostHTTP that posts to endpoint #1, and the second relationship going
>> to a
>> >> second PostHTTP that goes to endpoint #2.
>> >>
>> >> If you want to distribute the data with in the cluster, then you need
>> to
>> >> use site-to-site. The way you do this is the following...
>> >>
>> >> - Add an Input Port connected to your PutFile.
>> >> - Add GenerateFlowFile scheduled on primary node only, connected to a
>> >> Remote Process Group. The Remote Process Group should be connected to
>> the
>> >> Input Port from the previous step.
>> >>
>> >> So both nodes have an input port listening for data, but only the
>> primary
>> >> node produces a FlowFile and sends it to the RPG which then
>> re-distributes
>> >> it back to one of the Input Ports.
>> >>
>> >> In order for this to work you need to set
>> nifi.remote.input.socket.port in
>> >> nifi.properties to some available port, and you probably want
>> >> nifi.remote.input.secure=false for testing.
>> >>
>> >> -Bryan
>> >>
>> >>
>> >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>> >> <Chakrader.Dewaragatla@lifelock.com> wrote:
>> >>>
>> >>> Mark – I have setup a two node cluster and tried the following .
>> >>>  GenrateFlowfile processor (Run only on primary node) —>
>> DistributionLoad
>> >>> processor (RoundRobin)   —> PutFile
>> >>>
>> >>> >> The GetFile/PutFile will run on all nodes (unless you schedule
it
>> to
>> >>> >> run on primary node only).
>> >>> From your above comment, It should put file on two nodes. It put
>> files on
>> >>> primary node only. Any thoughts ?
>> >>>
>> >>> Thanks,
>> >>> -Chakri
>> >>>
>> >>> From: Mark Payne <markap14@hotmail.com>
>> >>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>> >>> Date: Wednesday, October 7, 2015 at 11:28 AM
>> >>>
>> >>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>> >>> Subject: Re: Nifi cluster features - Questions
>> >>>
>> >>> Chakri,
>> >>>
>> >>> Correct - when NiFi instances are clustered, they do not transfer data
>> >>> between the nodes. This is very different
>> >>> than you might expect from something like Storm or Spark, as the key
>> >>> goals and design are quite different.
>> >>> We have discussed providing the ability to allow the user to indicate
>> >>> that they want to have the framework
>> >>> do load balancing for specific connections in the background, but it's
>> >>> still in more of a discussion phase.
>> >>>
>> >>> Site-to-Site is simply the capability that we have developed to
>> transfer
>> >>> data between one instance of
>> >>> NiFi and another instance of NiFi. So currently, if we want to do load
>> >>> balancing across the cluster, we would
>> >>> create a site-to-site connection (by dragging a Remote Process Group
>> onto
>> >>> the graph) and give that
>> >>> site-to-site connection the URL of our cluster. That way, you can push
>> >>> data to your own cluster, effectively
>> >>> providing a load balancing capability.
>> >>>
>> >>> If you were to just run ListenHTTP without setting it to Primary Node,
>> >>> then every node in the cluster will be listening
>> >>> for incoming HTTP connections. So you could then use a simple load
>> >>> balancer in front of NiFi to distribute the load
>> >>> across your cluster.
>> >>>
>> >>> Does this help? If you have any more questions we're happy to help!
>> >>>
>> >>> Thanks
>> >>> -Mark
>> >>>
>> >>>
>> >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>> >>> <Chakrader.Dewaragatla@lifelock.com> wrote:
>> >>>
>> >>> Mark - Thanks for the notes.
>> >>>
>> >>> >> The other option would be to have a ListenHTTP processor run
on
>> >>> >> Primary Node only and then use Site-to-Site to distribute the
data
>> to other
>> >>> >> nodes.
>> >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary
>> node,
>> >>> collected data on primary node is not transfered to other nodes by
>> default
>> >>> for processing despite all nodes are part of one cluster?
>> >>> If ListenHTTP processor is running  as a dafult (with out explicit
>> >>> setting to run on primary node), how does the data transferred to
>> rest of
>> >>> the nodes? Does site-to-site come in play when I make one processor
>> to run
>> >>> on primary node ?
>> >>>
>> >>> Thanks,
>> >>> -Chakri
>> >>>
>> >>> From: Mark Payne <markap14@hotmail.com>
>> >>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>> >>> Date: Wednesday, October 7, 2015 at 7:00 AM
>> >>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>> >>> Subject: Re: Nifi cluster features - Questions
>> >>>
>> >>> Hello Chakro,
>> >>>
>> >>> When you create a cluster of NiFi instances, each node in the cluster
>> is
>> >>> acting independently and in exactly
>> >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly
>> the
>> >>> same flow. However, they will be
>> >>> pulling in different data and therefore operating on different data.
>> >>>
>> >>> So if you pull in 10 1-gig files from S3, each of those files will be
>> >>> processed on the node that pulled the data
>> >>> in. NiFi does not currently shuffle data around between nodes in the
>> >>> cluster (you can use site-to-site to do
>> >>> this if you want to, but it won't happen automatically). If you set
>> the
>> >>> number of Concurrent Tasks to 5, then
>> >>> you will have up to 5 threads running for that processor on each node.
>> >>>
>> >>> The only exception to this is the Primary Node. You can schedule a
>> >>> Processor to run only on the Primary Node
>> >>> by right-clicking on the Processor, and going to the Configure menu.
>> In
>> >>> the Scheduling tab, you can change
>> >>> the Scheduling Strategy to Primary Node Only. In this case, that
>> >>> Processor will only be triggered to run on
>> >>> whichever node is elected the Primary Node (this can be changed in the
>> >>> Cluster management screen by clicking
>> >>> the appropriate icon in the top-right corner of the UI).
>> >>>
>> >>> The GetFile/PutFile will run on all nodes (unless you schedule it to
>> run
>> >>> on primary node only).
>> >>>
>> >>> If you are attempting to have a single input running HTTP and then
>> push
>> >>> that out across the entire cluster to
>> >>> process the data, you would have a few options. First, you could just
>> use
>> >>> an HTTP Load Balancer in front of NiFi.
>> >>> The other option would be to have a ListenHTTP processor run on
>> Primary
>> >>> Node only and then use Site-to-Site
>> >>> to distribute the data to other nodes.
>> >>>
>> >>> For more info on site-to-site, you can see the Site-to-Site section
of
>> >>> the User Guide at
>> >>>
>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>> >>>
>> >>> If you have any more questions, let us know!
>> >>>
>> >>> Thanks
>> >>> -Mark
>> >>>
>> >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>> >>> <Chakrader.Dewaragatla@lifelock.com> wrote:
>> >>>
>> >>> Nifi Team – I would like to understand the advantages of Nifi
>> clustering
>> >>> setup.
>> >>>
>> >>> Questions :
>> >>>
>> >>>  - How does workflow work on multiple nodes ? Does it share the
>> resources
>> >>> intra nodes ?
>> >>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>> >>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per
>> node ?
>> >>>
>> >>>  - How to “isolate” the processor to the master node (or one node)?
>> >>>
>> >>> - Getfile/Putfile processors on cluster setup, does it get/put on
>> primary
>> >>> node ? How do I force processor to look in one of the slave node?
>> >>>
>> >>> - How can we have a workflow where the input side we want to receive
>> >>> requests (http) and then the rest of the pipeline need to run in
>> parallel on
>> >>> all the nodes ?
>> >>>
>> >>> Thanks,
>> >>> -Chakro
>> >>>
>> >>> ________________________________
>> >>> The information contained in this transmission may contain privileged
>> and
>> >>> confidential information. It is intended only for the use of the
>> person(s)
>> >>> named above. If you are not the intended recipient, you are hereby
>> notified
>> >>> that any review, dissemination, distribution or duplication of this
>> >>> communication is strictly prohibited. If you are not the intended
>> recipient,
>> >>> please contact the sender by reply email and destroy all copies of the
>> >>> original message.
>> >>> ________________________________
>> >>>
>> >>>
>> >>> ________________________________
>> >>> The information contained in this transmission may contain privileged
>> and
>> >>> confidential information. It is intended only for the use of the
>> person(s)
>> >>> named above. If you are not the intended recipient, you are hereby
>> notified
>> >>> that any review, dissemination, distribution or duplication of this
>> >>> communication is strictly prohibited. If you are not the intended
>> recipient,
>> >>> please contact the sender by reply email and destroy all copies of the
>> >>> original message.
>> >>> ________________________________
>> >>>
>> >>>
>> >>> ________________________________
>> >>> The information contained in this transmission may contain privileged
>> and
>> >>> confidential information. It is intended only for the use of the
>> person(s)
>> >>> named above. If you are not the intended recipient, you are hereby
>> notified
>> >>> that any review, dissemination, distribution or duplication of this
>> >>> communication is strictly prohibited. If you are not the intended
>> recipient,
>> >>> please contact the sender by reply email and destroy all copies of the
>> >>> original message.
>> >>> ________________________________
>> >>
>> >>
>> >> ________________________________
>> >> The information contained in this transmission may contain privileged
>> and
>> >> confidential information. It is intended only for the use of the
>> person(s)
>> >> named above. If you are not the intended recipient, you are hereby
>> notified
>> >> that any review, dissemination, distribution or duplication of this
>> >> communication is strictly prohibited. If you are not the intended
>> recipient,
>> >> please contact the sender by reply email and destroy all copies of the
>> >> original message.
>> >> ________________________________
>> >
>> >
>> >
>> > --
>> > Sent from Gmail Mobile
>> > ________________________________
>> > The information contained in this transmission may contain privileged
>> and
>> > confidential information. It is intended only for the use of the
>> person(s)
>> > named above. If you are not the intended recipient, you are hereby
>> notified
>> > that any review, dissemination, distribution or duplication of this
>> > communication is strictly prohibited. If you are not the intended
>> recipient,
>> > please contact the sender by reply email and destroy all copies of the
>> > original message.
>> > ________________________________
>>
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>

Mime
View raw message