nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Clarke <matt.clarke....@gmail.com>
Subject Re: Nifi cluster features - Questions
Date Mon, 11 Jan 2016 21:50:54 GMT
Chakri,
        All data is received on the primary Node only via the initial
listenHTTP.  Some routing tales place to send some data to a particular 5
nodes and other data to the other 5 nodes.  The postHTTP processor are
configured to send to a specific Node in your cluster using the same target
port number. A single ListenHTTP processor lives then runs on every Node
configured to use that target port number.

Thanks,
Matt

On Mon, Jan 11, 2016 at 4:47 PM, Matthew Clarke <matt.clarke.138@gmail.com>
wrote:

> Chakri,
>             What Mark is saying is NiFI Remote Process Group (RPG) also
> known as Site-to-Site will load-balance delivery data to all nodes in a
> cluster.  It can not be configured to balance data to only a subset of a
> nodes in a cluster.  If this is the strategy you want to deploy, a
> different approach must be taken (one that does not use Site-to-Site).
> Here is a NiFI diagram of one such approach using your example of a 10 node
> cluster:
>
> [image: Inline image 1]
>
>
>
> On Mon, Jan 11, 2016 at 4:16 PM, Chakrader Dewaragatla <
> Chakrader.Dewaragatla@lifelock.com> wrote:
>
>> Mark - Correct me if I understood right.
>>
>> Curl post from some application —> Configure Listen http (on primary
>> node) --> Post http with Data flow file (On primary node?)  --> Post to
>> site-to-site end point —> This intern distribute load to both slaves.
>>
>> Thanks,
>> -Chakri
>>
>> From: Mark Payne <markap14@hotmail.com>
>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>> Date: Monday, January 11, 2016 at 12:29 PM
>>
>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Chakri,
>>
>> At this time, your only options are to run the processors on all nodes or
>> a single node (Primary Node). There's no way to really group nodes together
>> and say "only run on this set of nodes."
>>
>> One option is to have a ListenHTTP Processor and then push data to that
>> NiFi via PostHTTP (configure it to send FlowFile attributes along). By
>> doing this, you could set up the sending NiFi
>> to only deliver data to two nodes. You could then have a different set of
>> data going to a different two nodes, etc. by the way that you configure
>> which data goes to which PostHTTP Processor.
>>
>> Does this give you what you need?
>>
>>
>> On Jan 11, 2016, at 3:20 PM, Chakrader Dewaragatla <
>> Chakrader.Dewaragatla@lifelock.com> wrote:
>>
>> Thanks Mark. I will look into it.
>>
>> Couple of questions:
>>
>>
>>    - Going back to my earlier question, In a nifi cluster with two
>>    slaves and NCM how do I make two slaves accept and process the incoming
>>    flowfile in distibuted fashion. Site to site is the only way to go ?
>>    In our use case, we have http listener running on primary node and
>>    putfile processor should run on two slaves in distributed fashion.
>>
>>    It is more like a new (or existing) feature.
>>     - In a nifi cluster setup, can we group the machines and set
>>    site-to-site to individual group.
>>     For instance I have 10 node cluster, can I group them into 5 groups
>>    with two nodes each. Run processors on dedicated group (using site to site
>>    or other means).
>>
>> Thanks,
>> -Chakri
>>
>> From: Mark Payne <markap14@hotmail.com>
>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>> Date: Monday, January 11, 2016 at 5:24 AM
>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Chakri,
>>
>> This line in the logs is particularly interesting (on primary node):
>>
>> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
>> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
>> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
>>
>>
>> This indicates that all of the site-to-site data will go to the host
>> i-c894e249.dev.aws.lifelock.ad. Moreover, because that is the only node
>> listed, this means
>> that the NCM responded, indicating that this is the only node in the
>> cluster that is currently connected and has site-to-site enabled. Can you
>> double-check the nifi.properties
>> file on the Primary Node and verify that the "
>> nifi.remote.input.socket.port" is property is specified, and that the "
>> nifi.remote.input.secure" property is set to "false"?
>> Of note is that if the "nifi.remote.input.secure" property is set to
>> true, but keystore and truststore are not specified, then site-to-site will
>> be disabled (there would be a warning
>> in the log in this case).
>>
>> If you can verify that both of those properties are set properly on both
>> nodes, then we can delve in further, but probably best to start by
>> double-checking the easy things :)
>>
>> Thanks
>> -Mark
>>
>>
>> On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <
>> Chakrader.Dewaragatla@lifelock.com> wrote:
>>
>> Bryan – Here are the logs :
>> I have 5 sec flow file.
>>
>> On primary node (No data coming in)
>>
>> 2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
>> 2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2]
>> o.a.nifi.controller.StandardFlowService Saved flow controller
>> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
>> pending = false
>> 2016-01-10 22:52:39,450 INFO [pool-26-thread-16]
>> o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled
>> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1
>> threads
>> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
>> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
>> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
>> 2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2]
>> o.a.nifi.controller.StandardFlowService Saved flow controller
>> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
>> pending = false
>> 2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
>> 2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7]
>> o.a.nifi.remote.StandardRemoteGroupPort
>> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
>> Successfully sent
>> [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
>> section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]]
>> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
>> milliseconds at a rate of 392 bytes/sec
>> 2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
>> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1]
>> o.a.nifi.remote.StandardRemoteGroupPort
>> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
>> Successfully sent
>> [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
>> section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]]
>> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51
>> milliseconds at a rate of 391 bytes/sec
>> 2016-01-10 22:52:45,092 INFO [Process NCM Request-2]
>> o.a.n.c.p.impl.SocketProtocolListener Received request
>> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
>> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
>> o.a.nifi.controller.StandardFlowService Received flow request message from
>> manager.
>> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
>> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
>> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes)
>> in 61 millis
>> 2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
>> 2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3]
>> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
>> for events starting with ID 301
>> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
>> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
>> (6 records) into single Provenance Log File
>> ./provenance_repository/295.prov in 111 milliseconds
>> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
>> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
>> Event file containing 8 records
>> 2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10]
>> o.a.nifi.remote.StandardRemoteGroupPort
>> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
>> Successfully sent
>> [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
>> section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]]
>> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
>> milliseconds at a rate of 392 bytes/sec
>> 2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
>> 2016-01-10 22:52:54,326 INFO [NiFi Web Server-22]
>> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
>> StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa]
>> to run
>> 2016-01-10 22:52:54,353 INFO [NiFi Web Server-26]
>> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
>> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
>> 2016-01-10 22:52:54,377 INFO [NiFi Web Server-25]
>> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
>> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
>> 2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
>> 2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2]
>> o.a.nifi.controller.StandardFlowService Saved flow controller
>> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
>> pending = false
>> 2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis
>>
>>
>> On Secondary node (Data coming in)
>>
>> 2016-01-10 22:52:43,896 INFO [pool-18-thread-1]
>> o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile
>> Repository with 0 records in 88 milliseconds
>> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3]
>> o.a.n.r.p.s.SocketFlowFileServerProtocol
>> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
>> Successfully received
>> [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]]
>> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
>> at a rate of 387 bytes/sec
>> 2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1]
>> o.a.nifi.processors.standard.PutFile
>> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
>> StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]
>> at location /root/putt/275243509297560
>> 2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3]
>> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
>> for events starting with ID 17037
>> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
>> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
>> (6 records) into single Provenance Log File
>> ./provenance_repository/17031.prov in 56 milliseconds
>> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
>> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
>> Event file containing 10 records
>> 2016-01-10 22:52:45,034 INFO [Process NCM Request-2]
>> o.a.n.c.p.impl.SocketProtocolListener Received request
>> e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
>> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
>> o.a.nifi.controller.StandardFlowService Received flow request message from
>> manager.
>> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
>> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
>> e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes)
>> in 76 millis
>> 2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
>> 2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6]
>> o.a.n.r.p.s.SocketFlowFileServerProtocol
>> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
>> Successfully received
>> [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]]
>> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
>> at a rate of 387 bytes/sec
>> 2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8]
>> o.a.nifi.processors.standard.PutFile
>> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
>> StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]
>> at location /root/putt/275248510432074
>> 2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
>> From: Bryan Bende <bbende@gmail.com>
>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>> Date: Sunday, January 10, 2016 at 2:43 PM
>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Chakri,
>>
>> Glad you got site-to-site working.
>>
>> Regarding the data distribution, I'm not sure why it is behaving that
>> way. I just did a similar test running ncm, node1, and node2 all on my
>> local machine, with GenerateFlowFile running every 10 seconds, and Input
>> Port going to a LogAttribute, and I see it alternating between node1 and
>> node2 logs every 10 seconds.
>>
>> Is there anything in your primary node logs
>> (primary_node/logs/nifi-app.log) when you see the data on the other node?
>>
>> -Bryan
>>
>>
>> On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <joe.witt@gmail.com> wrote:
>>
>>> Chakri,
>>>
>>> Would love to hear what you've learned and how that differed from the
>>> docs themselves.  Site-to-site has proven difficult to setup so we're
>>> clearly not there yet in having the right operator/admin experience.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
>>> <Chakrader.Dewaragatla@lifelock.com> wrote:
>>> > I was able to get site-to-site work.
>>> > I tried to follow your instructions to send data distribute across the
>>> > nodes.
>>> >
>>> > GenerateFlowFile (On Primary) —> RPG
>>> > RPG —> Input Port   —> Putfile (Time driven scheduling)
>>> >
>>> > However, data is only written to one slave (Secondary slave). Primary
>>> slave
>>> > has not data.
>>> >
>>> > Image screenshot :
>>> > http://tinyurl.com/jjvjtmq
>>> >
>>> > From: Chakrader Dewaragatla <chakrader.dewaragatla@lifelock.com>
>>> > Date: Sunday, January 10, 2016 at 11:26 AM
>>> >
>>> > To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> > Subject: Re: Nifi cluster features - Questions
>>> >
>>> > Bryan – Thanks – I am trying to setup site-to-site.
>>> > I have two slaves and one NCM.
>>> >
>>> > My properties as follows :
>>> >
>>> > On both Slaves:
>>> >
>>> > nifi.remote.input.socket.port=10880
>>> > nifi.remote.input.secure=false
>>> >
>>> > On NCM:
>>> > nifi.remote.input.socket.port=10880
>>> > nifi.remote.input.secure=false
>>> >
>>> > When I try drop remote process group (with http://<NCM
>>> IP>:8080/nifi), I see
>>> > error as follows for two nodes.
>>> >
>>> > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
>>> > communication
>>> > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
>>> > communication
>>> >
>>> > Do you have insight why its trying to connecting 8080 on slaves ? When
>>> do
>>> > 10880 port come into the picture ? I remember try setting site to site
>>> few
>>> > months back and succeeded.
>>> >
>>> > Thanks,
>>> > -Chakri
>>> >
>>> >
>>> >
>>> > From: Bryan Bende <bbende@gmail.com>
>>> > Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> > Date: Saturday, January 9, 2016 at 11:22 AM
>>> > To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> > Subject: Re: Nifi cluster features - Questions
>>> >
>>> > The sending node (where the remote process group is) will distribute
>>> the
>>> > data evenly across the two nodes, so an individual file will only be
>>> sent to
>>> > one of the nodes. You could think of it as if a separate NiFi instance
>>> was
>>> > sending directly to a two node cluster, it would be evenly
>>> distributing the
>>> > data across the two nodes. In this case it just so happens to all be
>>> with in
>>> > the same cluster.
>>> >
>>> > The most common use case for this scenario is the List and Fetch
>>> processors
>>> > like HDFS. You can perform the listing on primary node, and then
>>> distribute
>>> > the results so the fetching takes place on all nodes.
>>> >
>>> > On Saturday, January 9, 2016, Chakrader Dewaragatla
>>> > <Chakrader.Dewaragatla@lifelock.com> wrote:
>>> >>
>>> >> Bryan – Thanks, how do the nodes distribute the load for a input
>>> port. As
>>> >> port is open and listening on two nodes,  does it copy same files on
>>> both
>>> >> the nodes?
>>> >> I need to try this setup to see the results, appreciate your help.
>>> >>
>>> >> Thanks,
>>> >> -Chakri
>>> >>
>>> >> From: Bryan Bende <bbende@gmail.com>
>>> >> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> >> Date: Friday, January 8, 2016 at 3:44 PM
>>> >> To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> >> Subject: Re: Nifi cluster features - Questions
>>> >>
>>> >> Hi Chakri,
>>> >>
>>> >> I believe the DistributeLoad processor is more for load balancing when
>>> >> sending to downstream systems. For example, if you had two HTTP
>>> endpoints,
>>> >> you could have the first relationship from DistributeLoad going to a
>>> >> PostHTTP that posts to endpoint #1, and the second relationship going
>>> to a
>>> >> second PostHTTP that goes to endpoint #2.
>>> >>
>>> >> If you want to distribute the data with in the cluster, then you need
>>> to
>>> >> use site-to-site. The way you do this is the following...
>>> >>
>>> >> - Add an Input Port connected to your PutFile.
>>> >> - Add GenerateFlowFile scheduled on primary node only, connected to
a
>>> >> Remote Process Group. The Remote Process Group should be connected to
>>> the
>>> >> Input Port from the previous step.
>>> >>
>>> >> So both nodes have an input port listening for data, but only the
>>> primary
>>> >> node produces a FlowFile and sends it to the RPG which then
>>> re-distributes
>>> >> it back to one of the Input Ports.
>>> >>
>>> >> In order for this to work you need to set
>>> nifi.remote.input.socket.port in
>>> >> nifi.properties to some available port, and you probably want
>>> >> nifi.remote.input.secure=false for testing.
>>> >>
>>> >> -Bryan
>>> >>
>>> >>
>>> >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>>> >> <Chakrader.Dewaragatla@lifelock.com> wrote:
>>> >>>
>>> >>> Mark – I have setup a two node cluster and tried the following
.
>>> >>>  GenrateFlowfile processor (Run only on primary node) —>
>>> DistributionLoad
>>> >>> processor (RoundRobin)   —> PutFile
>>> >>>
>>> >>> >> The GetFile/PutFile will run on all nodes (unless you schedule
it
>>> to
>>> >>> >> run on primary node only).
>>> >>> From your above comment, It should put file on two nodes. It put
>>> files on
>>> >>> primary node only. Any thoughts ?
>>> >>>
>>> >>> Thanks,
>>> >>> -Chakri
>>> >>>
>>> >>> From: Mark Payne <markap14@hotmail.com>
>>> >>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> >>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>> >>>
>>> >>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> >>> Subject: Re: Nifi cluster features - Questions
>>> >>>
>>> >>> Chakri,
>>> >>>
>>> >>> Correct - when NiFi instances are clustered, they do not transfer
>>> data
>>> >>> between the nodes. This is very different
>>> >>> than you might expect from something like Storm or Spark, as the
key
>>> >>> goals and design are quite different.
>>> >>> We have discussed providing the ability to allow the user to indicate
>>> >>> that they want to have the framework
>>> >>> do load balancing for specific connections in the background, but
>>> it's
>>> >>> still in more of a discussion phase.
>>> >>>
>>> >>> Site-to-Site is simply the capability that we have developed to
>>> transfer
>>> >>> data between one instance of
>>> >>> NiFi and another instance of NiFi. So currently, if we want to do
>>> load
>>> >>> balancing across the cluster, we would
>>> >>> create a site-to-site connection (by dragging a Remote Process Group
>>> onto
>>> >>> the graph) and give that
>>> >>> site-to-site connection the URL of our cluster. That way, you can
>>> push
>>> >>> data to your own cluster, effectively
>>> >>> providing a load balancing capability.
>>> >>>
>>> >>> If you were to just run ListenHTTP without setting it to Primary
>>> Node,
>>> >>> then every node in the cluster will be listening
>>> >>> for incoming HTTP connections. So you could then use a simple load
>>> >>> balancer in front of NiFi to distribute the load
>>> >>> across your cluster.
>>> >>>
>>> >>> Does this help? If you have any more questions we're happy to help!
>>> >>>
>>> >>> Thanks
>>> >>> -Mark
>>> >>>
>>> >>>
>>> >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>>> >>> <Chakrader.Dewaragatla@lifelock.com> wrote:
>>> >>>
>>> >>> Mark - Thanks for the notes.
>>> >>>
>>> >>> >> The other option would be to have a ListenHTTP processor
run on
>>> >>> >> Primary Node only and then use Site-to-Site to distribute
the
>>> data to other
>>> >>> >> nodes.
>>> >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary
>>> node,
>>> >>> collected data on primary node is not transfered to other nodes
by
>>> default
>>> >>> for processing despite all nodes are part of one cluster?
>>> >>> If ListenHTTP processor is running  as a dafult (with out explicit
>>> >>> setting to run on primary node), how does the data transferred to
>>> rest of
>>> >>> the nodes? Does site-to-site come in play when I make one processor
>>> to run
>>> >>> on primary node ?
>>> >>>
>>> >>> Thanks,
>>> >>> -Chakri
>>> >>>
>>> >>> From: Mark Payne <markap14@hotmail.com>
>>> >>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> >>> Date: Wednesday, October 7, 2015 at 7:00 AM
>>> >>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> >>> Subject: Re: Nifi cluster features - Questions
>>> >>>
>>> >>> Hello Chakro,
>>> >>>
>>> >>> When you create a cluster of NiFi instances, each node in the
>>> cluster is
>>> >>> acting independently and in exactly
>>> >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run
>>> exactly the
>>> >>> same flow. However, they will be
>>> >>> pulling in different data and therefore operating on different data.
>>> >>>
>>> >>> So if you pull in 10 1-gig files from S3, each of those files will
be
>>> >>> processed on the node that pulled the data
>>> >>> in. NiFi does not currently shuffle data around between nodes in
the
>>> >>> cluster (you can use site-to-site to do
>>> >>> this if you want to, but it won't happen automatically). If you
set
>>> the
>>> >>> number of Concurrent Tasks to 5, then
>>> >>> you will have up to 5 threads running for that processor on each
>>> node.
>>> >>>
>>> >>> The only exception to this is the Primary Node. You can schedule
a
>>> >>> Processor to run only on the Primary Node
>>> >>> by right-clicking on the Processor, and going to the Configure menu.
>>> In
>>> >>> the Scheduling tab, you can change
>>> >>> the Scheduling Strategy to Primary Node Only. In this case, that
>>> >>> Processor will only be triggered to run on
>>> >>> whichever node is elected the Primary Node (this can be changed
in
>>> the
>>> >>> Cluster management screen by clicking
>>> >>> the appropriate icon in the top-right corner of the UI).
>>> >>>
>>> >>> The GetFile/PutFile will run on all nodes (unless you schedule it
to
>>> run
>>> >>> on primary node only).
>>> >>>
>>> >>> If you are attempting to have a single input running HTTP and then
>>> push
>>> >>> that out across the entire cluster to
>>> >>> process the data, you would have a few options. First, you could
>>> just use
>>> >>> an HTTP Load Balancer in front of NiFi.
>>> >>> The other option would be to have a ListenHTTP processor run on
>>> Primary
>>> >>> Node only and then use Site-to-Site
>>> >>> to distribute the data to other nodes.
>>> >>>
>>> >>> For more info on site-to-site, you can see the Site-to-Site section
>>> of
>>> >>> the User Guide at
>>> >>>
>>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>> >>>
>>> >>> If you have any more questions, let us know!
>>> >>>
>>> >>> Thanks
>>> >>> -Mark
>>> >>>
>>> >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>>> >>> <Chakrader.Dewaragatla@lifelock.com> wrote:
>>> >>>
>>> >>> Nifi Team – I would like to understand the advantages of Nifi
>>> clustering
>>> >>> setup.
>>> >>>
>>> >>> Questions :
>>> >>>
>>> >>>  - How does workflow work on multiple nodes ? Does it share the
>>> resources
>>> >>> intra nodes ?
>>> >>> Lets say I need to pull data 10 1Gig files from S3, how does work
>>> load
>>> >>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks
>>> per node ?
>>> >>>
>>> >>>  - How to “isolate” the processor to the master node (or one
node)?
>>> >>>
>>> >>> - Getfile/Putfile processors on cluster setup, does it get/put on
>>> primary
>>> >>> node ? How do I force processor to look in one of the slave node?
>>> >>>
>>> >>> - How can we have a workflow where the input side we want to receive
>>> >>> requests (http) and then the rest of the pipeline need to run in
>>> parallel on
>>> >>> all the nodes ?
>>> >>>
>>> >>> Thanks,
>>> >>> -Chakro
>>> >>>
>>> >>> ________________________________
>>> >>> The information contained in this transmission may contain
>>> privileged and
>>> >>> confidential information. It is intended only for the use of the
>>> person(s)
>>> >>> named above. If you are not the intended recipient, you are hereby
>>> notified
>>> >>> that any review, dissemination, distribution or duplication of this
>>> >>> communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> >>> please contact the sender by reply email and destroy all copies
of
>>> the
>>> >>> original message.
>>> >>> ________________________________
>>> >>>
>>> >>>
>>> >>> ________________________________
>>> >>> The information contained in this transmission may contain
>>> privileged and
>>> >>> confidential information. It is intended only for the use of the
>>> person(s)
>>> >>> named above. If you are not the intended recipient, you are hereby
>>> notified
>>> >>> that any review, dissemination, distribution or duplication of this
>>> >>> communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> >>> please contact the sender by reply email and destroy all copies
of
>>> the
>>> >>> original message.
>>> >>> ________________________________
>>> >>>
>>> >>>
>>> >>> ________________________________
>>> >>> The information contained in this transmission may contain
>>> privileged and
>>> >>> confidential information. It is intended only for the use of the
>>> person(s)
>>> >>> named above. If you are not the intended recipient, you are hereby
>>> notified
>>> >>> that any review, dissemination, distribution or duplication of this
>>> >>> communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> >>> please contact the sender by reply email and destroy all copies
of
>>> the
>>> >>> original message.
>>> >>> ________________________________
>>> >>
>>> >>
>>> >> ________________________________
>>> >> The information contained in this transmission may contain privileged
>>> and
>>> >> confidential information. It is intended only for the use of the
>>> person(s)
>>> >> named above. If you are not the intended recipient, you are hereby
>>> notified
>>> >> that any review, dissemination, distribution or duplication of this
>>> >> communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> >> please contact the sender by reply email and destroy all copies of the
>>> >> original message.
>>> >> ________________________________
>>> >
>>> >
>>> >
>>> > --
>>> > Sent from Gmail Mobile
>>> > ________________________________
>>> > The information contained in this transmission may contain privileged
>>> and
>>> > confidential information. It is intended only for the use of the
>>> person(s)
>>> > named above. If you are not the intended recipient, you are hereby
>>> notified
>>> > that any review, dissemination, distribution or duplication of this
>>> > communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> > please contact the sender by reply email and destroy all copies of the
>>> > original message.
>>> > ________________________________
>>>
>>
>> ------------------------------
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> ------------------------------
>>
>>
>> ------------------------------
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> ------------------------------
>>
>>
>> ------------------------------
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> ------------------------------
>>
>
>

Mime
View raw message