nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Villard <pierre.villard...@gmail.com>
Subject Re: Nifi capabilities
Date Mon, 12 Aug 2019 12:57:18 GMT
It mainly depends of your workloads. NiFi is not memory consuming unless
you're doing specific operations on the data / use memory intensive
processors. For high performance you'd likely go for CPU-optimized VMs with
attached SSDs for repositories. But my recommendation is to start small and
adapt your setup based on your needs / observations.

Le lun. 12 août 2019 à 14:46, Dweep Sharma <dweep.sharma@redbus.com> a
écrit :

> Thanks,
>
> We are pretty much on the AWS cloud and Hardware/OS failures are very
> unlikely.
>
> Can you please suggest a  machine type on AWS, I am considering M5.xLarge.
>
> Need to choose a machine type based on  prioritizing.
> 1) High Disk I/O
> 2) Memory
> 3) CPU
>
> -Dweep
>
> On Mon, Aug 5, 2019 at 5:32 PM Purushotham Pushpavanthar <
> pushpavanthar@gmail.com> wrote:
>
>> Hi Dweep,
>>
>> I would like to add to Pierre Villard's insightful answer.
>>  2)  NiFi having at least 3 filesystem repositories, multiple write and
>> read occur on same record on different stages of a single pipeline. This
>> demands for high IOPS. Vertical scaling of IOPS is very costly/leads to
>> roadblock sometimes which can be handled better in clustered mode by load
>> balancing of flowfiles.
>>
>> Regards,
>> Purushotham Pushpavanth
>>
>>
>>
>> On Mon, 5 Aug 2019 at 15:37, Pierre Villard <pierre.villard.fr@gmail.com>
>> wrote:
>>
>>> Hi Dweep,
>>>
>>> I'll let other chime in, but here are some answers to your questions:
>>>
>>> 1) Yes - NiFi supports a very fine-grained authorizations model and
>>> authentication mechanisms.
>>> Authentication:
>>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication
>>> Authorization:
>>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#multi-tenant-authorization
>>>
>>> You can also find resources on the Internet on how to setup
>>> authentication & authorization.
>>>
>>> 2) I'd say that it is up to your requirements and if you need high
>>> availability. From a pure performance standpoint, vertical scaling is
>>> probably enough for your use case unless you have very huge amounts of
>>> data. Clustering will help you achieve even better performance (millions of
>>> events per second), and will improve reliability in case of failure.
>>>
>>> 3) Yes the data is persisted. There are some parameters that you can
>>> tune based on your tolerance against data loss.
>>> Example: nifi.flowfile.repository.always.sync - If set to true, any change
>>> to the repository will be synchronized to the disk, meaning that NiFi will
>>> ask the operating system not to cache the information. This is very
>>> expensive and can significantly reduce NiFi performance. However, if it is
>>> false, there could be the potential for data loss if either there is a
>>> sudden power loss or the operating system crashes. The default value is
>>> false.
>>>
>>> In other words, unless you have serious hardware/OS failures, you should
>>> not lose any data. And everything will be persisted/restart upon NiFi
>>> restart. In case data loss is a critical part of your system, using a
>>> broker like Kafka with the ability to replay events could be a possible
>>> solution.
>>>
>>> 4) I recommend this awesome post by Bryan:
>>> https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka
>>>
>>> 5) There are some options available for the metrics. You can have a look
>>> at reporting tasks for this purpose. A set or articles you can read is
>>> available here:
>>> https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/
>>>
>>> Hope this helps!
>>> Pierre
>>>
>>>
>>>
>>>
>>>
>>> Le lun. 5 août 2019 à 07:11, Dweep Sharma <dweep.sharma@redbus.com> a
>>> écrit :
>>>
>>>> Hi All,
>>>>
>>>> I have been using Nifi to setup some pipelines now. Before I can absorb
>>>> more use cases into this, I need to understand a few capabilities
>>>>
>>>> 1) Can we setup an user authentication before the web application. If
>>>> yes, is there a way we can have role based access for processor groups. I
>>>> would like certain teams working on only specific groups and not control
>>>> all.
>>>>
>>>> 2) If the major use case would only involve reading from RMQ, KAFKA
>>>> convert to parquet and store in S3, does it make sense to setup a cluster
>>>> or just vertical scaling is good ?
>>>>
>>>> 3) Are the flow files in the queues (connections between processors)
>>>> persisted?. Any machine failure or restart would cause a loss of data ? For
>>>> instance messages are dequeued form RMQ and lost due to failure. Which
>>>> would be a best way to handle this ? I think maintaining a low back
>>>> pressure (threshold) can help mitigate the loss
>>>>
>>>> 4) Does the Kafka consumer, by default consume all partitions or is
>>>> there a way to control that.
>>>>
>>>> 5) Can we have some of the metrics of processors pushed out as
>>>> notifications or alerts (flow file count in / out or errors etc)
>>>>
>>>> It would be great, if someone could share resources that address these.
>>>>
>>>> Thanks in advance.
>>>>
>>>> -Dweep
>>>>
>>>>
>>>>
>>>>
>>>> *::DISCLAIMER::----------------------------------------------------------------------------------------------------------------------------------------------------The
>>>> contents of this e-mail and any attachments are confidential and intended
>>>> for the named recipient(s) only.E-mail transmission is not guaranteed to
be
>>>> secure or error-free as information could be intercepted, corrupted,lost,
>>>> destroyed, arrive late or incomplete, or may contain viruses in
>>>> transmission. The e mail and its contents(with or without referred errors)
>>>> shall therefore not attach any liability on the originator or redBus.com.
>>>> Views or opinions, if any, presented in this email are solely those of the
>>>> author and may not necessarily reflect the views or opinions of redBus.com.
>>>> Any form of reproduction, dissemination, copying, disclosure,
>>>> modification,distribution and / or publication of this message without the
>>>> prior written consent of authorized representative of redbus.
>>>> <http://redbus.in/>com is strictly prohibited. If you have received
this
>>>> email in error please delete it and notify the sender immediately.Before
>>>> opening any email and/or attachments, please check them for viruses and
>>>> other defects.*
>>>
>>>
>
>
>
> *::DISCLAIMER::----------------------------------------------------------------------------------------------------------------------------------------------------The
> contents of this e-mail and any attachments are confidential and intended
> for the named recipient(s) only.E-mail transmission is not guaranteed to be
> secure or error-free as information could be intercepted, corrupted,lost,
> destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents(with or without referred errors)
> shall therefore not attach any liability on the originator or redBus.com.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the views or opinions of redBus.com.
> Any form of reproduction, dissemination, copying, disclosure,
> modification,distribution and / or publication of this message without the
> prior written consent of authorized representative of redbus.
> <http://redbus.in/>com is strictly prohibited. If you have received this
> email in error please delete it and notify the sender immediately.Before
> opening any email and/or attachments, please check them for viruses and
> other defects.*

Mime
View raw message