nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sivaprasanna <sivaprasanna...@gmail.com>
Subject Re: Implementation of ListFile's Primary Node only in a cluster
Date Fri, 23 Feb 2018 16:58:37 GMT
I have started working on an annotation implementation wherein the
developer can use that annotation to indicate that processor is supposed to
be set to run only on 'Primary node'. Framework side of things work just
fine. However, for UI side there are a couple of questions and issues:

   1. nf-processor-details.js
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-ui/src/main/webapp/js/nf/nf-processor-details.js#L220>
and nf-processor-configuration.js
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-ui/src/main/webapp/js/nf/canvas/nf-processor-configuration.js#L745>
checks if the setup 'isClustered' or 'executionNode === PRIMARY' which
confuses me. Checking ' nfClusterSummary.isClustered()' alone is enough,
right? The reason is, since we are also checking 'executionNode ===
Primary', even for single instance NiFi i.e. non clustered setup, the
'execution-node-options' will be rendered for processors marked with this
annotation.
   2. In order to avoid this, I made a change to the code and removed the
'executionNode === PRIMARY' condition check in the mentioned files. Even
after that, 'execution-node-options' is being rendered. Am I missing
something?

I have pushed these changes to my remote repo. Here is the link:
https://github.com/zenfenan/nifi/commit/e09e85960fb394eeef89d9cb6aa7acdfc5d4dad3

BTW, right now I have implemented it in this way : If the annotation is
present, at the time of processor creation/instantiation, the executionNode
will be set to 'PRIMARY'. However this can be changed later by configuring
the processor from the UI. Should we think about disabling the 'Execution
Node' configuration altogether (from UI) for a processor marked with this
annotation (which makes more sense to me but kinda seems to be restricting
the users' liberty from choosing according their wish) ?


On Sun, Feb 11, 2018 at 12:59 AM, Bryan Bende <bbende@gmail.com> wrote:

> Currently it means that the dataflow manager/developer is expected to
> set the 'Execution Nodes' strategy to "Primary Node" at the time of
> flow design.
>
> We don't have anything that restricts the scheduling strategy of a
> processor, but we probably should consider having an annotation like
> @PrimaryNodeOnly that you can put on a processor and then the
> framework will enforce that it can only be scheduled on primary node.
>
> In the case of ListFile, I think the statement in the documentation is
> only partially true...
>
> When "Input Directory Location" is set to local, there should be no
> issue with scheduling the processor on all nodes in the cluster, as it
> would be listing a local directory and storing state locally.
>
> When "Input Directory Location" is set to remote, it wouldn't make
> sense to have all nodes listing the same remote directory and getting
> the same results, and also the state is then stored in ZooKeeper under
> a ZNode using the processor's UUID, and the processor has the same
> UUID on each node so they would be overwriting each other's state in
> ZK.
>
> So ListFile probably can't be restricted to primary node only, where
> as something like ListHDFS probably could because it is always listing
> a remote destination.
>
>
> On Fri, Feb 9, 2018 at 10:55 PM, Sivaprasanna <sivaprasanna246@gmail.com>
> wrote:
> > I was going through ListFile processor's code and found out that in the
> > documentation
> > <https://github.com/apache/nifi/blob/master/nifi-nar-
> bundles/nifi-standard-bundle/nifi-standard-processors/src/
> main/java/org/apache/nifi/processors/standard/ListFile.java#L72-L76>,
> > it is mentioned that "this processor is designed to run on Primary Node
> > only in a cluster". I want to understand what "designed" stands for here.
> > Does that mean the processor was built in a way that it only runs on the
> > Primary node regardless of the "Execution Nodes" strategy set to
> otherwise
> > or does it mean that dataflow manager/developer is expected to set the
> > 'Execution Nodes' strategy to "Primary Node" at the time of flow design?
> If
> > it is of the former case, how is it handled in the code? If it is
> handled,
> > it should be in the framework side but I don't see any annotation
> > indicating anything related to such mechanism in the processor code and
> > more over a related JIRA NIFI-543
> > <https://issues.apache.org/jira/browse/NIFI-543> is also open so I want
> > clear my doubt.
> >
> > -
> > Sivaprasanna
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message