nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clay Teahouse <clayteaho...@gmail.com>
Subject Re: Optimizing Performance of Apache NiFi's Network Listening Processors
Date Tue, 06 Aug 2019 14:07:54 GMT
Hello Bryan,

I am ingesting millions of syslog records from various data sources. I need
to make sure the format is valid and then prefix each message with the host
name (from syslog header) and some other meta data and push the records to
various consumers.

thanks
Clay

On Tue, Aug 6, 2019 at 6:26 AM Bryan Bende <bbende@gmail.com> wrote:

> Can you describe what you want to do with each message?
>
> Right now I’m not following why you need to parse them.
>
> On Tue, Aug 6, 2019 at 6:40 AM Clay Teahouse <clayteahouse@gmail.com>
> wrote:
>
>> Bryan,
>> Understood, but wouldn't then this processor be inefficient if you are
>> dealing with a very large number of syslog messages, if you don't have the
>> batching option? I suppose we could have had the option of parsing each
>> syslog record in a batch and then writing the syslog message along with the
>> syslog headers to the flowfile content.
>> thanks
>> Clay
>>
>> On Mon, Aug 5, 2019 at 12:12 PM Bryan Bende <bbende@gmail.com> wrote:
>>
>>> Clay,
>>>
>>> You can only parse when its 1 message per flow file because parsing
>>> adds all the field/value pairs as flow file attributes, which wouldn't
>>> really make sense when you have say 1k messages with all different
>>> values for those fields.
>>>
>>> -Bryan
>>>
>>> On Mon, Aug 5, 2019 at 11:25 AM Clay Teahouse <clayteahouse@gmail.com>
>>> wrote:
>>> >
>>> > Hi Edward, Bryan
>>> > One more question regarding ListenSyslog. Is it possible to set batch
>>> size > 1 with parse set to true? I am ingesting a very high volume of
>>> syslog records and want to avoid flowfiles containing only one record but
>>> at the same time, I want to be able to parse the records. Is there a way
>>> around this?
>>> >
>>> > thanks
>>> > Clay
>>> >
>>> > On Fri, Aug 2, 2019 at 8:50 AM Edward Armes <edward.armes@gmail.com>
>>> wrote:
>>> >>
>>> >> HI Clay,
>>> >>
>>> >> So as Bryan has said the actual connection is managed by a selector
>>> and all this does is goes through each connection and once that connection
>>> has data to receive it the selector then hands that over to a thread in the
>>> TCP receiving thread pool which does then some basic TCP processing and
>>> puts it into a buffer for an instance of associated ListenSyslog processor
>>> to processes, when the framework executes an instance of that processor.
>>> >>
>>> >> Just so you're aware while setting the maximum number of connections
>>> does create a thread pool of 4,000 threads. In reality these threads don't
>>> really exist until one is created by the selector to run on the pool. So in
>>> short unless a single Nifi server gets 4,000 syslog messages in a very
>>> short space time (< 1 micro-second) I can't see it being an issue.
>>> >>
>>> >> Edward
>>> >>
>>> >> On Fri, Aug 2, 2019 at 2:06 PM Bryan Bende <bbende@gmail.com>
wrote:
>>> >>>
>>> >>> The actual connections themselves are managed with a selector, so
if
>>> >>> all the connections are idle there should only be one thread for
the
>>> >>> socket.
>>> >>>
>>> >>> As soon as a connection has something available to read then a thread
>>> >>> is spawned to start reading the connection until either no matter
is
>>> >>> available, or it is closed.
>>> >>>
>>> >>> On Fri, Aug 2, 2019 at 7:18 AM Clay Teahouse <clayteahouse@gmail.com>
>>> wrote:
>>> >>> >
>>> >>> > Hello Edward,
>>> >>> > So, if have of to listen to 32,000 tcp connections and I have
only
>>> 80 cores, and I configure each ListenSyslog instance for 4,000 connections,
>>> doesn't each spawn 4,000 threads behind the scene? The tcp connections will
>>> be idle most of the time.
>>> >>> >
>>> >>> > thanks
>>> >>> > Clay
>>> >>> >
>>> >>> >
>>> >>> > On Fri, Aug 2, 2019 at 6:10 AM Edward Armes <
>>> edward.armes@gmail.com> wrote:
>>> >>> >>
>>> >>> >> Hi Clay,
>>> >>> >>
>>> >>> >> Because Nifi underneath uses a thread pool for it's own
threading
>>> underneath, and each instance processor runs does so in it's own thread, I
>>> don't see any reason why not. One thing to note that the way the ListenTCP
>>> processor appears to have been written such that it gets all the requests
>>> that have been received on that socket and processes them until either it
>>> has no more requests left or process or that instance of the processor is
>>> no longer scheduled to run.
>>> >>> >>
>>> >>> >> Hope that helps
>>> >>> >>
>>> >>> >> Edward
>>> >>> >>
>>> >>> >> On Fri, Aug 2, 2019 at 11:28 AM Clay Teahouse <
>>> clayteahouse@gmail.com> wrote:
>>> >>> >>>
>>> >>> >>> Hello All,
>>> >>> >>>
>>> >>> >>> I need to listen to and process thousands of persistent
TCP
>>> connections. I have 10 nodes, each having 8 cores.
>>> >>> >>> My understanding is that with existing NiFi listening
>>> processors, such as ListnSyslog, a thread is utilized for each TCP
>>> connection. Does this scale? Do I need to write a custom processor that
>>> utilizes a thread pool for reading the data from the socket and processing
>>> them?
>>> >>> >>>
>>> >>> >>> thanks
>>> >>> >>> Clay
>>>
>> --
> Sent from Gmail Mobile
>

Mime
View raw message