nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aldrin Piri <aldrinp...@gmail.com>
Subject Re: New to NiFi and interested on clustering capabilities
Date Thu, 30 Apr 2015 06:09:41 GMT
David,

As to your first question, the way files are distributed in the cluster is
primarily by their system of ingress.  More specifically, if a file is
pulled via a given processor on one of your nodes, the data continues its
full journey through the configured data flow on that physical system.
Given the context of your second inquiry, I'd imagine a possible pain point
would be having a central point introducing all the files to your system.
One way that you can provide better distribution is via a fan out type of
approach.  Use your single, isolated processor to introduce data to the
system and then provide a DistributeLoad processor to provide 1/n of your
data to each of the other systems in your n-node cluster. The sending of
data to remote systems, could be accomplished with n-1 PostHTTP processors
configured to point to one of your n-instances and is received by a
ListenHTTP.  The "local" system would be able to bypass the network
send/receive and just go straight into the flow.  This is not ideal for
what I perceive to be your case, and there is an associated issue with the
request for the feature to support such a balancing automatically [2].  Any
additional thoughts you had on the issue would be appreciated.


The isolation mode you are looking for is available in a clustered flow via
the processor configuration.  Select the scheduling tab, and for scheduling
strategy, select "On primary node."  Just to call out it out explicitly,
this option will only show up in a clustered flow.

There is at least one related ticket to this subject, NIFI-401 [1].  If
this particular method doesn't quite meet your use case, we would
definitely like to hear about suggestions or opinions on how to make this
better.

[1] https://issues.apache.org/jira/browse/NIFI-401
[2] https://issues.apache.org/jira/browse/NIFI-337

On Wed, Apr 29, 2015 at 1:49 PM, David Klim <davidklmlg@hotmail.com> wrote:

> Thanks you all for the information :)
>
> There is some detail I am missing which is how a defined flow gets
> partitioned across the nodes in the cluster. The now updated doc states "the
> same dataflow runs on all the nodes. As a result, every component in the
> flow runs on every node". How the files are partitioned to be collected
> by different nodes is relevant for the solution I am working on (at least
> could have implications on the definition of the dataflow itself) so I
> would like to dig in here.
>
> The doc also says "the DFM could configure the GetSFTP on the Primary
> Node to run in isolation, meaning that it only runs on that node".  I was
> trying to find this "isolation" configuration but no luck. Any hints? :-)
>
> Thanks again!
>
>
> ------------------------------
> Date: Wed, 29 Apr 2015 08:28:02 -0400
> Subject: Re: New to NiFi and interested on clustering capabilities
> From: matt.c.gilman@gmail.com
> To: users@nifi.incubator.apache.org
>
>
> Anup,
>
> That section is still incomplete unfortunately. We are definitely pushing
> the documentation at the moment. Personally, I am working through getting
> our REST endpoints documented. I know another committer has been working on
> the contribution guide as well some introduction to NiFi quick start
> guides. I can provide some quick points here in the meantime.
>
> In the section for web properties you'll want to configure the 'https'
> properties instead of the 'http' properties.
>
> nifi.web.http.host=
> nifi.web.http.port=
> nifi.web.https.host=
> nifi.web.https.port=
>
> The further down you'll need to configure the security properties.
>
> nifi.security.keystore=
> nifi.security.keystoreType=
> nifi.security.keystorePasswd=
> nifi.security.keyPasswd=
> nifi.security.truststore=
> nifi.security.truststoreType=
> nifi.security.truststorePasswd=
> nifi.security.needClientAuth=
>
> These will define the certificates that are used by the web server (and
> cluster and site to site communications). You will need to configure all
> the keystore properties and truststore properties (if keyPasswd is not
> configured the keystorePasswd will be tried as the keyPasswd). If you set
> needClientAuth to false, clients will be required to trust the keystore
> configured here. User access will still be anonymous however. If you set
> needClientAuth to true, clients will need to have certificates loaded in
> their browser that are trusted by the truststore configured here. User
> access will be considered using the DN from their certificate and the
> authorization provider.
>
> NiFi supports pluggable authorization which is only necessary if
> needClientAuth is set to true. By default its configured with a file based
> solution.
>
> nifi.security.user.authority.provider=file-provider
>
> Details on setting up this file and controlling the level of access have
> started being discussed here [1].
>
> Hope this helps while we get more detailed documentation written up.
> Thanks.
>
> Matt
>
> [1]
> https://nifi.incubator.apache.org/docs/nifi-docs/administration-guide.html#controlling-levels-of-access
>
>
> On Wed, Apr 29, 2015 at 7:13 AM, Sethuram, Anup <anup.sethuram@philips.com
> > wrote:
>
>  Hi David,
> Is the “Security Configuration” added in the latest admin guide?
>
>  Regards,
> anup
>
>   From: Matt Gilman <matt.c.gilman@gmail.com>
> Reply-To: "users@nifi.incubator.apache.org" <
> users@nifi.incubator.apache.org>
> Date: Wednesday, 29 April 2015 12:03 am
> To: "users@nifi.incubator.apache.org" <users@nifi.incubator.apache.org>
> Subject: Re: New to NiFi and interested on clustering capabilities
>
>   David,
>
>  Welcome and thanks for expressing interest in Apache NiFi. I just
> noticed that the administrator guide [1] on our website [2] was not in its
> current form so just uploaded the latest version. The document now includes
> a quick explanation of our clustering capabilities and example
> configurations. This would be a great place to start and become familiar
> with NiFi clustering. Please let us know if you have any follow up
> questions.
>
>  Also, if you had already viewed the administrator guide your browser may
> have cached the older version so you may need to do a hard reload.
>
>  [1]
> https://nifi.incubator.apache.org/docs/nifi-docs/administration-guide.html
> [2] https://nifi.incubator.apache.org/
>
> On Tue, Apr 28, 2015 at 2:06 PM, David Klim <davidklmlg@hotmail.com>
> wrote:
>
>  Hello,
>
>  Just joined the list, I am evaluating NiFi for a large project to see if
> NiFi would fit as the main data collector. So far I am quite impressed with
> it's capabilities, the concept is just great!
>
>  The project I am working on would require retrieving several hundreds of
> millions of files per day (hundreds of TB per day) so my first question is
> how to achieve distribution/clustering with NiFi, if that's possible.
>
>  Thanks in advance!
>
>
>
>
>
>
>
> ------------------------------
> The information contained in this message may be confidential and legally
> protected under applicable law. The message is intended solely for the
> addressee(s). If you are not the intended recipient, you are hereby
> notified that any use, forwarding, dissemination, or reproduction of this
> message is strictly prohibited and may be unlawful. If you are not the
> intended recipient, please contact the sender by return e-mail and destroy
> all copies of the original message.
>
>
>

Mime
View raw message