nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: NiFi consumers concurrency
Date Fri, 18 Jan 2019 15:18:33 GMT
Boris

You should check the total number of threads the flow controller allows.
But also your description of how you'd like the processor to work is indeed
how it should work.  I'd also mention that you want to make sure you're
taking advantage of Kafka and NiFi's ability to operate on batches
efficiently.  When you poll for records/messages from Kafka often way more
than one is made available.  However, the default mode of ConsumeKafka is
single message per flowfile.  ConsumeKafka allows you to set a demarcator
value which means it will take what Kafka gives us and put a delimeter
between each record and put all that in a single flowfile.  This is great
for CSVs for instance.  Alternatively, and more recommended, take a look at
ConsumeKafkaRecord which inherently handles this demarcation logic and does
so using the appropriate mechanism for the given format/schema.  Depending
on the use case/scenario you might need to update your scripted processor
to operate on many records in a single flowfile or you can restructure that
to be using the record oriented scripted processors/controller services
which is often done for maximum performance and control.

I'm not sure how well the Kafka client will behave when it is looking at
nearly 250 topics with relatively few threads but in any case the way you
expected NiFi to behave is how it should behave...  That we see the other
threads not being active is interesting for sure.  Check max threads on
controller and please share details of settings on the kafka procs.

Thanks

On Fri, Jan 18, 2019 at 10:09 AM Charlie Meyer <
charlie.meyer@civitaslearning.com> wrote:

> Hi Boris,
>
> I have seen behavior similar to this before on other flows I have run. In
> your Controller settings (in the hamburger menu at the top right of the
> UI), have you adjusted the Maximum Timer Driven Thread Count?
>
> On Fri, Jan 18, 2019 at 9:04 AM Boris Tyukin <boris@boristyukin.com>
> wrote:
>
>> Hi all, Happy Friday!
>>
>> I wonder if you have any ideas how to improve concurrency with NiFi Kafka
>> consumer processors.
>>
>> We have 3 NiFi consumer processors in the flow, each listening to 250
>> topics. Each topic has 1 partition (it is critical for us to preserve
>> order). List of topic names is given as a comma-delimited list.
>>
>> When we set concurrency to 6 for each consumer, hoping that NiFi will
>> spawn 6 consumers per processor and they will be working concurrently,
>> feeding data from 6 topics at once. And since we have 3 consumer
>> processors, it should give us 18 concurrent feeds total.
>>
>> [image: image.png]
>>
>> Apparently, it does not work like that. NiFi does create 6 consumers per
>> processor (if I set concurrency to 6) but for some reason only one consumer
>> is reading from Kafka and one topic at the time while other 5 are sitting
>> and doing nothing. Because of that, total throughput is not really good.
>>
>> We are looking at source code but I am hoping for some quick tips /
>> direction.
>>
>> We could create 750 NiFi consumer processors instead but we do not really
>> like that idea.
>>
>> Boris
>>
>

Mime
View raw message