storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Meyerowitz (BLOOMBERG/ 731 LEX)" <>
Subject Re: [DISCUSS] Would like to make collective intelligence about Metrics on Storm
Date Fri, 06 May 2016 13:29:48 GMT
I recall seeing in another thread a discussion about monitoring metrics for various queues
within a worker.  For us this would be pretty key for each executor input and output LMAX
queue as well as the worker level input and output queues.  In our topologies we run one task
per executor so it would help us get a much better understanding of the performance of our
components.  If acking is turned off, which it is for our topologies, it's hard to get the
full picture of the performance of the various components we have.  The execute and process
latency only tells part of a larger story.  For the queues, generally we would like to see
queue utilization and how long tuples stayed on the queue.

Also generally we would like more than average.  For example, min/max/average/standard deviation..
percentiles, whatever.  Average definitely smooths the bumps and it's good but we'd gain more
insight in understanding outliers and the larger performance picture.

From: At: Apr 20 2016 00:30:05
Subject: Re: [DISCUSS] Would like to make collective intelligence about Metrics on Storm

Let me start sharing my thought. :)

1. Need to enrich docs about metrics / stats.

In fact, I couldn't see the fact - topology stats are sampled by default and sample rate is
0.05 - from the docs when I was newbie of Apache Storm. It made me misleading and made me
saying "Why there're difference between the counts?". I also saw some mails from user@ about
same question. If we include this to guide doc that would be better.

And Metrics document page seems not well written. I think it has appropriate headings but
lacks contents on each heading. 
It should be addressed, and introducing some external metrics consumer plugins (like storm-graphite
from Verisign) would be great, too.

2. Need to increase sample rate or (ideally) no sampling at all.

Let's postpone considering performance hit at this time.
Ideally, we expect precision of metrics gets better when we increase sample rate. It affects
non-gauge kinds of metrics which are counter, and latency, and so on.

Btw, I would like to hear about opinions on latency since I'm not an expert. 
Storm provides only average latency and it's indeed based on sample rate. Do we feel OK with
this? If not how much having also percentiles can help us?

Jungtaek Lim (HeartSaVioR)

2016년 4월 20일 (수) 오전 10:55, Jungtaek Lim <>님이 작성:

Hi Storm users,

I'm Jungtaek Lim, committer and PMC member of Apache Storm.

If you subscribed dev@ mailing list, you may have seen that recently we're addressing the
metrics feature on Apache Storm.

For now, improvements are going forward based on current metrics feature.

- Improve (Topology) MetricsConsumer
- Provide topology metrics in detail (metrics per each stream)
- (WIP) Introduce Cluster Metrics Consumer

As I don't maintain large cluster for myself, I really want to collect the any ideas for improving,
any inconveniences, use cases of Metrics with community members, so we're on the right way
to go forward.

Let's talk!

Thanks in advance,
Jungtaek Lim (HeartSaVioR)

View raw message