nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clay Teahouse <clayteaho...@gmail.com>
Subject Re: Optimizing Performance of Apache NiFi's Network Listening Processors
Date Tue, 06 Aug 2019 14:28:49 GMT
Many thanks, Bryan for the quick feedback. Will look into these options.

On Tue, Aug 6, 2019 at 9:24 AM Bryan Bende <bbende@gmail.com> wrote:

> Ok makes sense, there are basically two options to make it efficient...
>
> A) You can use ListenSyslog with batching, followed by ValidateRecord
> with one of the syslog record readers  [1][2].
>
> B) You can use ListenTCPRecord with a syslog record reader.
>
> A will probably work better for a larger number of TCP connections, B
> would work better for a smaller number of connections.
>
> One challenge with both of them is that there isn't a syslog record
> writer, so you would probably have to use the
> FreeFormTextRecordSetWriter with some expression that rewrites the
> message using the record fields, like "${hostname} ${body}" if you
> wanted to rewrite each message with the hostname and body.
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.9.2/org.apache.nifi.syslog.SyslogReader/index.html
> [2]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.9.2/org.apache.nifi.syslog.Syslog5424Reader/index.html
> [3]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.9.2/org.apache.nifi.text.FreeFormTextRecordSetWriter/index.html
>
> On Tue, Aug 6, 2019 at 10:08 AM Clay Teahouse <clayteahouse@gmail.com>
> wrote:
> >
> > Hello Bryan,
> >
> > I am ingesting millions of syslog records from various data sources. I
> need to make sure the format is valid and then prefix each message with the
> host name (from syslog header) and some other meta data and push the
> records to various consumers.
> >
> > thanks
> > Clay
> >
> > On Tue, Aug 6, 2019 at 6:26 AM Bryan Bende <bbende@gmail.com> wrote:
> >>
> >> Can you describe what you want to do with each message?
> >>
> >> Right now I’m not following why you need to parse them.
> >>
> >> On Tue, Aug 6, 2019 at 6:40 AM Clay Teahouse <clayteahouse@gmail.com>
> wrote:
> >>>
> >>> Bryan,
> >>> Understood, but wouldn't then this processor be inefficient if you are
> dealing with a very large number of syslog messages, if you don't have the
> batching option? I suppose we could have had the option of parsing each
> syslog record in a batch and then writing the syslog message along with the
> syslog headers to the flowfile content.
> >>> thanks
> >>> Clay
> >>>
> >>> On Mon, Aug 5, 2019 at 12:12 PM Bryan Bende <bbende@gmail.com> wrote:
> >>>>
> >>>> Clay,
> >>>>
> >>>> You can only parse when its 1 message per flow file because parsing
> >>>> adds all the field/value pairs as flow file attributes, which wouldn't
> >>>> really make sense when you have say 1k messages with all different
> >>>> values for those fields.
> >>>>
> >>>> -Bryan
> >>>>
> >>>> On Mon, Aug 5, 2019 at 11:25 AM Clay Teahouse <clayteahouse@gmail.com>
> wrote:
> >>>> >
> >>>> > Hi Edward, Bryan
> >>>> > One more question regarding ListenSyslog. Is it possible to set
> batch size > 1 with parse set to true? I am ingesting a very high volume of
> syslog records and want to avoid flowfiles containing only one record but
> at the same time, I want to be able to parse the records. Is there a way
> around this?
> >>>> >
> >>>> > thanks
> >>>> > Clay
> >>>> >
> >>>> > On Fri, Aug 2, 2019 at 8:50 AM Edward Armes <edward.armes@gmail.com>
> wrote:
> >>>> >>
> >>>> >> HI Clay,
> >>>> >>
> >>>> >> So as Bryan has said the actual connection is managed by a
> selector and all this does is goes through each connection and once that
> connection has data to receive it the selector then hands that over to a
> thread in the TCP receiving thread pool which does then some basic TCP
> processing and puts it into a buffer for an instance of associated
> ListenSyslog processor to processes, when the framework executes an
> instance of that processor.
> >>>> >>
> >>>> >> Just so you're aware while setting the maximum number of
> connections does create a thread pool of 4,000 threads. In reality these
> threads don't really exist until one is created by the selector to run on
> the pool. So in short unless a single Nifi server gets 4,000 syslog
> messages in a very short space time (< 1 micro-second) I can't see it being
> an issue.
> >>>> >>
> >>>> >> Edward
> >>>> >>
> >>>> >> On Fri, Aug 2, 2019 at 2:06 PM Bryan Bende <bbende@gmail.com>
> wrote:
> >>>> >>>
> >>>> >>> The actual connections themselves are managed with a selector,
so
> if
> >>>> >>> all the connections are idle there should only be one thread
for
> the
> >>>> >>> socket.
> >>>> >>>
> >>>> >>> As soon as a connection has something available to read
then a
> thread
> >>>> >>> is spawned to start reading the connection until either
no matter
> is
> >>>> >>> available, or it is closed.
> >>>> >>>
> >>>> >>> On Fri, Aug 2, 2019 at 7:18 AM Clay Teahouse <
> clayteahouse@gmail.com> wrote:
> >>>> >>> >
> >>>> >>> > Hello Edward,
> >>>> >>> > So, if have of to listen to 32,000 tcp connections
and I have
> only 80 cores, and I configure each ListenSyslog instance for 4,000
> connections, doesn't each spawn 4,000 threads behind the scene? The tcp
> connections will be idle most of the time.
> >>>> >>> >
> >>>> >>> > thanks
> >>>> >>> > Clay
> >>>> >>> >
> >>>> >>> >
> >>>> >>> > On Fri, Aug 2, 2019 at 6:10 AM Edward Armes <
> edward.armes@gmail.com> wrote:
> >>>> >>> >>
> >>>> >>> >> Hi Clay,
> >>>> >>> >>
> >>>> >>> >> Because Nifi underneath uses a thread pool for
it's own
> threading underneath, and each instance processor runs does so in it's own
> thread, I don't see any reason why not. One thing to note that the way the
> ListenTCP processor appears to have been written such that it gets all the
> requests that have been received on that socket and processes them until
> either it has no more requests left or process or that instance of the
> processor is no longer scheduled to run.
> >>>> >>> >>
> >>>> >>> >> Hope that helps
> >>>> >>> >>
> >>>> >>> >> Edward
> >>>> >>> >>
> >>>> >>> >> On Fri, Aug 2, 2019 at 11:28 AM Clay Teahouse
<
> clayteahouse@gmail.com> wrote:
> >>>> >>> >>>
> >>>> >>> >>> Hello All,
> >>>> >>> >>>
> >>>> >>> >>> I need to listen to and process thousands
of persistent TCP
> connections. I have 10 nodes, each having 8 cores.
> >>>> >>> >>> My understanding is that with existing NiFi
listening
> processors, such as ListnSyslog, a thread is utilized for each TCP
> connection. Does this scale? Do I need to write a custom processor that
> utilizes a thread pool for reading the data from the socket and processing
> them?
> >>>> >>> >>>
> >>>> >>> >>> thanks
> >>>> >>> >>> Clay
> >>
> >> --
> >> Sent from Gmail Mobile
>

Mime
View raw message