nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Josef.Zahn...@swisscom.com>
Subject Re: ConsumeKafkaRecord Performance Issue
Date Fri, 19 Jun 2020 06:47:11 GMT
Hi Stefan

I’m guessing you are referencing to the loadbalance mechanism for flowfiles on the connection…
The issue is, the records don’t even come fast enough out of the processor, so the later
processing isn’t the issue (sorry - for people who are familiar with Kafka, in Kafka is
as well a queue with messages which then can be consumed from the ConsumeKafakRecord processor).
It’s something in the ConsumeKafkaRecord processor that causes the slow performance. But
thanks anyway :-). Any other comments?

Cheers Josef

From: Stefan Kok <stefan.kok@centilliard.io>
Reply to: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Friday, 19 June 2020 at 08:29
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: ConsumeKafkaRecord Performance Issue

Hi Josef

Have you tried the load balancing which became available in later versions of Nifi? We had
struggled with performance relating to the transfer of records between Teradata and Oracle.
Once we enable the load balancing available on the relationships between processors we saw
a dramatic improvement in performance.

In our case, the Round Rodin setting spread the load on the Record processors effectively
thus improving the performance. We only enabled load balancing on the critical parts of the
flow.


Regards
Stefan

On Fri, 2020-06-19 at 05:55 +0000, Josef.Zahner1@swisscom.com wrote:
Hi guys,

We have faced a strange behavior of the ConsumeKafkaRecord processor (and it’s pendant ConsumeKafka).
We have a kafka Topic with 15 partitions and a producer which inserts via NiFi in peak about
40k records per second to the topic. The thing is now, it doesn’t matter whether we are
using the 8-Node Cluster or configuring execution on “Primary Node”, the performance is
terrible. We made a test with execution on “Primary Node” and started with one thread,
the result can you see below. As soon as we reached 3 threads the performance went down and
never went higher than that, doesn’t matter how many threads or cluster nodes. We tried
2 threads in the 8 node cluster (16 threads in total) and even more. Didn’t help, we stuck
at this 12’000’000 – 14’000’000 records per 5 min (so round about 45k records per
second). Btw. for the tests we were always behind the offset, so there were a lot of messages
in the kafka queue.

[A close up of a map  Description automatically generated]


We also tested with the performance script which comes with kafka. It showed 250k messages/s
without any tuning at all (however without any decoding of the messages of course). So in
theory kafka and the network in between couldn’t be the culprit. It must be something within
NiFi.

[user@nifi ~]$ /opt/kafka_2.12-2.3.1/bin/kafka-consumer-perf-test.sh --broker-list kafka.xyz.net:9093<http://kafka.sbd.corproot.net:9093/>
--group nifi --topic events --consumer.config /opt/sbd_kafka/credentials_prod/client-ssl.properties
--messages 3000000

start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms,
fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2020-06-15 17:20:05:273, 2020-06-15 17:20:20:429, 515.7424, 34.0289, 3000000, 197941.4093,
3112, 12044, 42.8215, 249086.6822


We have also seen that “Max Poll Records” in our case never gets reached, we had in max.
about 400 records in one flowfile even though we configured 100’000 - which could be a part
of the problem.

[cid:image002.png@01D64616.384AF670]

Seems that I’m not alone with my issue, even though his performance was even worse than
ours:
https://stackoverflow.com/questions/62104646/nifi-poor-performance-of-consumekafkarecord-2-0-and-consumekafka-2-0

Any help would be really appreciated.

If nobody has an idea I have to open a bug ticket :-(.

Cheers, Josef


Mime
View raw message