nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <manoj.ses...@thomsonreuters.com>
Subject RE: Feature Request: Isolated Processors on ANY ONE node rather than on Primary node alone
Date Tue, 05 Apr 2016 12:46:48 GMT
Hi Mark - thanks for your prompt response. A few thoughts .. 

a) Currently, when Processor A is configured to run on the Primary Node, in the absence of
special configuration (e.g. to the rest of the flow configured as a Process Group), the downstream
Processors in the flow seem to automatically run on the Primary Node too. So in a sense, we
have the affinity or grouping of processors to a given node already, except this is limited
to the Primary Node. Could we not allow the scheduling of the Isolated Processor to occur
on ANY single node, rather than just the Primary node? That would suffice for our current
use case - i.e. we would be perfectly load balanced on initial ingest ACROSS the entire cluster,
even though the entire downstream flow would run on whichever node the isolated processor
was (randomly) scheduled on.

b) That said, the "Scheduling Group" paradigm sounds very promising, if that includes the
ability to Group Processors/Flows, as well as restrict their running to Groups-of-nodes. It
is even more interesting if the concept can be coupled with Multi-tenancy, so cluster-resources
(viz. the nodes) can be partitioned/isolated-to particular tenants.

Regards, Manoj 

Manoj Seshan - Senior Architect
Platform Content Technology, Bangalore

Voice: +91-9686578756  +91-80-67492572

-----Original Message-----
From: Mark Payne [mailto:markap14@hotmail.com] 
Sent: Tuesday, April 05, 2016 6:01 PM
To: dev@nifi.apache.org
Subject: Re: Feature Request: Isolated Processors on ANY ONE node rather than on Primary node
alone

Manoj,

That is a very good point, and it is something that we are working toward.
However, it does get a little bit more complicated than this. If you have some Processor,
say Processor A running on some arbitrary node, there will often be times that you will also
need another Processor, Processor B, running on that same node.

Using a Primary Node means that we are able to accomplish this easily, but as you are noting
here, it is quite limiting. In version 1.0.0 of NiFi, one of the big changes in a Zero-Master
clustering design, whereby the Primary Node is automatically elected and fails over to a different
node whenever the Primary Node leaves the cluster. This improves the overall functionality
of Primary Node but does not address the issue here, of avoiding scheduling all "singleton"
processors on the same node.

I think the path that we'd like to take moving forward, post-1.0.0, is to provide a mechanism
that allows the user to schedule a Processor to run in some sort of named "Scheduling Group".
So, for instance, you could say Processor A and B should both run in "Group A" but Processor
C should run in "Group C". This way, we can ensure that Processors that need to run together
can do so while at the same time avoiding the need for all such processors to run on the same
node.

Does this sound like a reasonable approach for your use case?

Thanks
-Mark

> On Apr 5, 2016, at 3:08 AM, <manoj.seshan@thomsonreuters.com> <manoj.seshan@thomsonreuters.com>
wrote:
> 
> For the purposes of symmetry of the NiFi Cluster, and so that the initial ingest of content
is not limited to just one primary node in the NiFi cluster, would it not be beneficial  for
the framework to have the ability to schedule an Isolated Processor on ANY ONE of available
nodes in the NiFi Cluster?
>  
> Regards, Manoj
>  
> Manoj Seshan - Senior Architect
> Platform Content Technology, Bangalore
> 
> Voice: +91-9686578756  +91-80-67492572


Mime
View raw message