nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: NiFi consumers concurrency
Date Thu, 24 Jan 2019 20:56:54 GMT
Boris,

On the Settings tab, have you changed the value of the "Yield Duration"? The default, I believe,
is 1 second.
I would recommend that you change that to "0 sec" and that may do the trick.

Thanks
-Mark

On Jan 24, 2019, at 3:30 PM, Boris Tyukin <boris@boristyukin.com<mailto:boris@boristyukin.com>>
wrote:

any ideas?

we've added another 7 topics per Kafka Consumer processor (so 9 topics total) and with concurrency
set to 4, it still pulls in one thread, using the same consumer ID. other 3 are sitting and
doing nothing.

Based on the quick review of the source code, processor should spin up multiple Kafka consumers
up to a concurrency number defined on a processor but clearly, this is not happening.

On Fri, Jan 18, 2019 at 10:31 AM Boris Tyukin <boris@boristyukin.com<mailto:boris@boristyukin.com>>
wrote:

Hi Joe and Charlie,

thanks for a quick response and it is good to hear it should work like we expect it to work.

We already bumped controller thread count to 50 and we also use demarcator to group messages
out of kafka processor (this did improve performance quite a bit).

We also tried to play with the last two properties (max poll records and uncommited time).

<image.png>

<image.png>

check a screenshot below - you can see that the first two topics are consumed by the same
consumer-id, next two by another one and next 2 by another but then you have a bunch of consumer
IDs doing nothing.
<image.png>


On Fri, Jan 18, 2019 at 10:18 AM Joe Witt <joe.witt@gmail.com<mailto:joe.witt@gmail.com>>
wrote:
Boris

You should check the total number of threads the flow controller allows.  But also your description
of how you'd like the processor to work is indeed how it should work.  I'd also mention that
you want to make sure you're taking advantage of Kafka and NiFi's ability to operate on batches
efficiently.  When you poll for records/messages from Kafka often way more than one is made
available.  However, the default mode of ConsumeKafka is single message per flowfile.  ConsumeKafka
allows you to set a demarcator value which means it will take what Kafka gives us and put
a delimeter between each record and put all that in a single flowfile.  This is great for
CSVs for instance.  Alternatively, and more recommended, take a look at ConsumeKafkaRecord
which inherently handles this demarcation logic and does so using the appropriate mechanism
for the given format/schema.  Depending on the use case/scenario you might need to update
your scripted processor to operate on many records in a single flowfile or you can restructure
that to be using the record oriented scripted processors/controller services which is often
done for maximum performance and control.

I'm not sure how well the Kafka client will behave when it is looking at nearly 250 topics
with relatively few threads but in any case the way you expected NiFi to behave is how it
should behave...  That we see the other threads not being active is interesting for sure.
 Check max threads on controller and please share details of settings on the kafka procs.

Thanks

On Fri, Jan 18, 2019 at 10:09 AM Charlie Meyer <charlie.meyer@civitaslearning.com<mailto:charlie.meyer@civitaslearning.com>>
wrote:
Hi Boris,

I have seen behavior similar to this before on other flows I have run. In your Controller
settings (in the hamburger menu at the top right of the UI), have you adjusted the Maximum
Timer Driven Thread Count?

On Fri, Jan 18, 2019 at 9:04 AM Boris Tyukin <boris@boristyukin.com<mailto:boris@boristyukin.com>>
wrote:
Hi all, Happy Friday!

I wonder if you have any ideas how to improve concurrency with NiFi Kafka consumer processors.

We have 3 NiFi consumer processors in the flow, each listening to 250 topics. Each topic has
1 partition (it is critical for us to preserve order). List of topic names is given as a comma-delimited
list.

When we set concurrency to 6 for each consumer, hoping that NiFi will spawn 6 consumers per
processor and they will be working concurrently, feeding data from 6 topics at once. And since
we have 3 consumer processors, it should give us 18 concurrent feeds total.

<image.png>

Apparently, it does not work like that. NiFi does create 6 consumers per processor (if I set
concurrency to 6) but for some reason only one consumer is reading from Kafka and one topic
at the time while other 5 are sitting and doing nothing. Because of that, total throughput
is not really good.

We are looking at source code but I am hoping for some quick tips / direction.

We could create 750 NiFi consumer processors instead but we do not really like that idea.

Boris


Mime
View raw message