nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Jackoway <al...@cloudera.com>
Subject Re: Feature Request: Isolated Processors on ANY ONE node rather than on Primary node alone
Date Tue, 05 Apr 2016 13:27:58 GMT
We have begun using a workaround for a problem similar to this, but it is
fairly ugly. In many cases we really want to run something like an
ingestion process from an external system at a specific time on one node.
Without https://issues.apache.org/jira/browse/NIFI-401 you can't quite do
it.

What we do instead is we run a cron-scheduled GenerateFlowFiles processor
and then pipe it into RouteOnAttributes where the attribute expressions
look like this:
* host1 - ${hostname():equals('host1.cloudera.com')}
* host2 - ${hostname():equals('host2.cloudera.com')}
...

Then we only connect the node we want the code to run on.

The downside is that it is fragile. If one of those hosts goes down we have
to find everywhere that we chose it as the running node and change them to
some other host. Additionally there is no concept of choosing the node
based on load, so we have to make sure we spread out the work appropriately.

Scheduling Group - as long as it supports cron - sounds wonderful to me,
and I am looking forward to the solution. But if you need a way to do it
now and are willing to do the scheduling manually there is a way to do it.
Alan

On Tue, Apr 5, 2016 at 8:46 AM, <manoj.seshan@thomsonreuters.com> wrote:

> Hi Mark - thanks for your prompt response. A few thoughts ..
>
> a) Currently, when Processor A is configured to run on the Primary Node,
> in the absence of special configuration (e.g. to the rest of the flow
> configured as a Process Group), the downstream Processors in the flow seem
> to automatically run on the Primary Node too. So in a sense, we have the
> affinity or grouping of processors to a given node already, except this is
> limited to the Primary Node. Could we not allow the scheduling of the
> Isolated Processor to occur on ANY single node, rather than just the
> Primary node? That would suffice for our current use case - i.e. we would
> be perfectly load balanced on initial ingest ACROSS the entire cluster,
> even though the entire downstream flow would run on whichever node the
> isolated processor was (randomly) scheduled on.
>
> b) That said, the "Scheduling Group" paradigm sounds very promising, if
> that includes the ability to Group Processors/Flows, as well as restrict
> their running to Groups-of-nodes. It is even more interesting if the
> concept can be coupled with Multi-tenancy, so cluster-resources (viz. the
> nodes) can be partitioned/isolated-to particular tenants.
>
> Regards, Manoj
>
> Manoj Seshan - Senior Architect
> Platform Content Technology, Bangalore
>
> Voice: +91-9686578756  +91-80-67492572
>
> -----Original Message-----
> From: Mark Payne [mailto:markap14@hotmail.com]
> Sent: Tuesday, April 05, 2016 6:01 PM
> To: dev@nifi.apache.org
> Subject: Re: Feature Request: Isolated Processors on ANY ONE node rather
> than on Primary node alone
>
> Manoj,
>
> That is a very good point, and it is something that we are working toward.
> However, it does get a little bit more complicated than this. If you have
> some Processor, say Processor A running on some arbitrary node, there will
> often be times that you will also need another Processor, Processor B,
> running on that same node.
>
> Using a Primary Node means that we are able to accomplish this easily, but
> as you are noting here, it is quite limiting. In version 1.0.0 of NiFi, one
> of the big changes in a Zero-Master clustering design, whereby the Primary
> Node is automatically elected and fails over to a different node whenever
> the Primary Node leaves the cluster. This improves the overall
> functionality of Primary Node but does not address the issue here, of
> avoiding scheduling all "singleton" processors on the same node.
>
> I think the path that we'd like to take moving forward, post-1.0.0, is to
> provide a mechanism that allows the user to schedule a Processor to run in
> some sort of named "Scheduling Group". So, for instance, you could say
> Processor A and B should both run in "Group A" but Processor C should run
> in "Group C". This way, we can ensure that Processors that need to run
> together can do so while at the same time avoiding the need for all such
> processors to run on the same node.
>
> Does this sound like a reasonable approach for your use case?
>
> Thanks
> -Mark
>
> > On Apr 5, 2016, at 3:08 AM, <manoj.seshan@thomsonreuters.com> <
> manoj.seshan@thomsonreuters.com> wrote:
> >
> > For the purposes of symmetry of the NiFi Cluster, and so that the
> initial ingest of content is not limited to just one primary node in the
> NiFi cluster, would it not be beneficial  for the framework to have the
> ability to schedule an Isolated Processor on ANY ONE of available nodes in
> the NiFi Cluster?
> >
> > Regards, Manoj
> >
> > Manoj Seshan - Senior Architect
> > Platform Content Technology, Bangalore
> >
> > Voice: +91-9686578756  +91-80-67492572
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message