nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: Is nifi a good fit for this use case?
Date Wed, 18 Nov 2015 16:52:17 GMT
Hello Philippe,

I believe what you state to be true but should be clear that I am not
an expert in the details of Storm.  Definitely would encourage you to
ask their community or check their docs.  That said, in the case of
storm such concept makes less sense since in storm you design the
flow/topology you want and then you unleash it to the cluster and
storm (as I understand it) determines where it runs and such.  With
NiFi you are interactively altering the live/active flow of data.
This model for NiFi makes great sense because we're operating in the
'dataflow space' where we see our job as to capture/acquire data from
any number of systems, do routing, transformation, etc.. and
ultimately deliver to any number of destination systems.  I think the
model for Storm makes great sense too for their world which is firing
off processing tasks.  I think Storm would be less effective in our
space and I think our solution would be less effective in their space.
You can do data processing in both and you can do some elements of
dataflow/system integration in both.  That does create confusion for
folks admittedly as they look at each because depending on their
perspective they see them as competitive.  We've even been compared to
Spark which is also kinda wild.

It is all about design tradeoffs system developers make.  In NiFi all
of our tradeoffs are based on our strong focus on striving for
excellence in the dataflow/system integration space.  For higher order
complex event processing use systems like Storm and Spark and others
that were designed for that side.  NiFi will happily feed data to and
receive data from such systems.

Thanks
Joe

On Wed, Nov 18, 2015 at 2:33 AM,  <philippe.gibert@orange.com> wrote:
> Hello Joe
> Thanks for  these clear explanations
> Another great  feature available  from NIfi  comparing with storm is :  ( if I understand
well  :-) )
> - The possibility to stop processors , then add some processors in the middle of the
topology and then restart the workflow ..
> It can be qualified as runtime topology modification ..... and that behavior is not possible
with Storm ( right ? pls tell me if I am wrong )
>
> Philippe
> Best regards
>
> -----Message d'origine-----
> De : Joe Witt [mailto:joe.witt@gmail.com]
> Envoyé : mercredi 11 novembre 2015 03:28
> À : users@nifi.apache.org
> Objet : Re: Is nifi a good fit for this use case?
>
> Darren,
>
> In short, yes I think NiFi can handle such a case in a generic sense quite well.
>
> Read on for the longer response...
>
> NiFi can process extremely large data, extremely large datasets, extremely small data
and high rates, variable sized data, etc.. It makes this efficient by its design, how the
content repository works whereby it supports pass-by-reference and copy-on-write behavior
and that it operates in a manner that allows disk caching benefits to really shine through.
>
> Now that said if all that is of interest is pure 'processing' and having a general purpose
processing framework Storm, Spark, others are focused solely on that space.  NiFi is focused
on the management of dataflows from wherever in your enterprise data is created, produced,
etc.. to and through processing systems and ultimately into storage systems like HDFS, NoSQL
stores, relational databases.
>
> So depending on what you're trying to do to these documents be it feature extraction,
transformation, etc.. NiFi may be a great choice or NiFi may simply be the tool you use to
feed this data into systems like Storm or Spark or others.  You can absolutely parallelize
the flow of data across a NiFi cluster.  For producers we offer a library to interact with
our site to site protocol which will handle things like load balancing and failover and make
it really easy to stream data to NiFi.  Or NiFi itself could pull from your system if perhaps
these documents are sitting as files or available via some other supported interface.
>
> NiFi can be configured to control the rate of processing, queue data, apply back-pressure,
handle errors, and a number of other features that are beneficial to the dataflow management
problem.
>
> NiFi supports making tradeoffs at key points in the flow for batch (time tolerant) or
low latency (time sensitive) processing/distribution.  Whether data arrives in a streaming
or batch fashion and whether it must be delivered to systems in batch or streaming fashion
is a concern that NiFi handles well so the various systems can be less coupled.
>
> Regarding its elasticity I will state that NiFi is not elastic in the sense that it will
(at this time) automatically provision additional nodes to take on the work load and then
deprovision them as the load decreases.  We will get there.  But what we support are key capabilities
like event driven processing with upper bounds on threads, back-pressure which can propogate
to the source causing data to go to lesser loaded nodes, and so on.  These are elements of
elastic behavior but it is not elastic provisioning (as folks often mean).
>
> I hope this response is helpful.  If any of this was unclear or you want to dive deeper
just let us know.
>
> Thanks
> Joe
>
> On Tue, Nov 10, 2015 at 6:30 PM, Darren Govoni <darren@ontrenet.com> wrote:
>> Hi,
>>   I studied the nifi website a bit and if I missed a key part, forgive
>> me for asking this question.
>> But I am wondering if or how nifi can accommodate processing large
>> data sets with possibly compute intensive operations.
>> For example, if we have say 2 million documents, how does nifi make
>> processing these documents efficient?
>> I understand the visual workflow and its nice. How is that
>> parallelized across a data set?
>>
>> Do we submit all the documents to a cluster of flows (how many?) that
>> execute some number of documents simultaneously?
>> Does nifi support batch processing? Is it elastic?
>>
>> Thanks.
>
> _________________________________________________________________________________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou
privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message
par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques
etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie.
Merci.
>
> This message and its attachments may contain confidential or privileged information that
may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete this message
and its attachments.
> As emails may be altered, Orange is not liable for messages that have been modified,
changed or falsified.
> Thank you.
>

Mime
View raw message