kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Mittal <sjmit...@gmail.com>
Subject Need some help in identifying some important metrics to monitor for streams
Date Thu, 02 Mar 2017 11:54:09 GMT
Hello All,
I had few questions regarding monitoring of kafka streams application and
what are some important metrics we should collect in our case.

Just a brief overview, we have a single thread application (0.10.1.1)
reading from single partition topic and it is working all fine.
Then we have same application (using 0.10.2.0) multi threaded with 4
threads per machine and 3 machines cluster setup reading for same but
partitioned topic (12 partitions).
Thus we have each thread processing single partition same case as earlier
one.

The new setup also works fine in steady state, but under load somehow it
triggers frequent re-balance and then we run into all sort of issues like
stream thread dying due to CommitFailedException or entering into deadlock
state.
After a while we restart all the instances then it works fine for a while
and again we get the same problem and it goes on.

1. So just to monitor, like when first thread fails what would be some
important metrics we should be collecting to get some sense of whats going
on?

2. Is there any metric that tells time elapsed between successive poll
requests, so we can monitor that?

Also I did monitor rocksdb put and fetch times for these 2 instances and
here is the output I get:
0.10.1.1
$>get -s  -b kafka.streams:type=stream-rocksdb-window-metrics,client-id=new-advice-1-StreamThread-1
key-table-put-avg-latency-ms
#mbean = kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-advice-1-StreamThread-1:
206431.7497615029
$>get -s  -b kafka.streams:type=stream-rocksdb-window-metrics,client-id=new-advice-1-StreamThread-1
key-table-fetch-avg-latency-ms
#mbean = kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-advice-1-StreamThread-1:
2595394.2746129474
$>get -s  -b kafka.streams:type=stream-rocksdb-window-metrics,client-id=new-advice-1-StreamThread-1
key-table-put-qps
#mbean = kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-advice-1-StreamThread-1:
232.86299499317252
$>get -s  -b kafka.streams:type=stream-rocksdb-window-metrics,client-id=new-advice-1-StreamThread-1
key-table-fetch-qps
#mbean = kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-advice-1-StreamThread-1:
373.61071016166284

Same values for 0.10.2.0 I get
$>get -s -b kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1
key-table-put-latency-avg
#mbean = kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1:
1199859.5535022356
$>get -s -b kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1
key-table-fetch-latency-avg
#mbean = kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1:
3679340.80748852
$>get -s -b kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1
key-table-put-rate
#mbean = kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1:
56.134778706069184
$>get -s -b kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1
key-table-fetch-rate
#mbean = kafka.streams:type=stream-rocksdb-window-metrics,client-
id=new-part-advice-d1094e71-0f59-45e8-98f4-477f9444aa91-StreamThread-1:
136.10721427931827

I notice that result in 10.2.0 is much worse than same for 10.1.1

I would like to know
1. Is there any benchmark on rocksdb as at what rate/latency it should be
doing put/fetch operations.

2. What could be the cause of inferior numbers in 10.2.0, is it because
this application is also running three other threads doing the same thing.

3. Also whats with the name new-part-advice-d1094e71-
0f59-45e8-98f4-477f9444aa91-StreamThread-1
    I wanted to put this as a part of my cronjob, so why can't we have
simpler name like we have in 10.1.1, so it is easy to write the script.

Thanks
Sachin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message