nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Kok <stefan....@centilliard.io>
Subject Re: ConsumeKafkaRecord Performance Issue
Date Fri, 19 Jun 2020 06:29:20 GMT
Hi Josef
Have you tried the load balancing which became available in later
versions of Nifi? We had struggled with performance relating to the
transfer of records between Teradata and Oracle. Once we enable the
load balancing available on the relationships between processors we saw
a dramatic improvement in performance.
In our case, the Round Rodin setting spread the load on the Record
processors effectively thus improving the performance. We only enabled
load balancing on the critical parts of the flow.

RegardsStefan
On Fri, 2020-06-19 at 05:55 +0000, Josef.Zahner1@swisscom.com wrote:
> Hi guys,
>  
> We have faced a strange behavior of the ConsumeKafkaRecord processor
> (and it’s pendant ConsumeKafka). We have a kafka Topic with 15
> partitions and a producer which inserts via NiFi in peak about 40k
> records per second
>  to the topic. The thing is now, it doesn’t matter whether we are
> using the 8-Node Cluster or configuring execution on “Primary Node”,
> the performance is terrible. We made a test with execution on
> “Primary Node” and started with one thread, the result can you
>  see below. As soon as we reached 3 threads the performance went down
> and never went higher than that, doesn’t matter how many threads or
> cluster nodes. We tried 2 threads in the 8 node cluster (16 threads
> in total) and even more. Didn’t help, we stuck at this
>  12’000’000 – 14’000’000 records per 5 min (so round about 45k
> records per second). Btw. for the tests we were always behind the
> offset, so there were a lot of messages in the kafka queue.
> 
>  
> 
>  
>  
> We also tested with the performance script which comes with kafka. It
> showed 250k messages/s without any tuning at all (however without any
> decoding of the messages of course). So in theory kafka and the
> network in between
>  couldn’t be the culprit. It must be something within NiFi.
>  
> [user@nifi
> ~]$ /opt/kafka_2.12-2.3.1/bin/kafka-consumer-perf-test.sh --broker-
> list kafka.xyz.net:9093 --group
>  nifi --topic events --consumer.config
> /opt/sbd_kafka/credentials_prod/client-ssl.properties --
> messages 3000000
>  
> start.time, end.time, data.consumed.in.MB, MB.sec,
> data.consumed.in.nMsg,
>  nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec,
> fetch.nMsg.sec
> 2020-06-15 17:20:05:273, 2020-06-
> 15 17:20:20:429, 515.7424, 34.0289, 3000000, 197941.4093, 3112, 12044
> , 42.8215, 249086.6822
>  
>  
> We have also seen that “Max Poll Records” in our case never gets
> reached, we had in max. about 400 records in one flowfile even though
> we configured 100’000 - which could be a part of the problem.
>  
> 
>  
> Seems that I’m not alone with my issue, even though his performance
> was even worse than ours:
> https://stackoverflow.com/questions/62104646/nifi-poor-performance-of-consumekafkarecord-2-0-and-consumekafka-2-0
>  
> Any help would be really appreciated.
>  
> If nobody has an idea I have to open a bug ticket :-(.
>  
> Cheers, Josef
>  
>  
> 
> 
> 

Mime
View raw message