storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikos R. Katsipoulakis" <nick.kat...@gmail.com>
Subject Re: Complete Latency Vs. Throughput--when do they not change in same direction?
Date Fri, 01 Apr 2016 13:54:19 GMT
Hello John,

I have to say that a system's telemetry is not a mystery easily understood.
Then, let us try to deduce what might be the case in your use-case that
causes inconsistent performance metrics.

At first, I would like to ask if your KafkaSpout's produce tuples with the
same rate. In other words, do you produce or read data in a deterministic
(replay-able) way; or do you attach your KafkaSpout to a non-controllable
source of data (like Twitter feed, news feed etc)? The reason I am asking
is because figuring out what happens in the source of your data (in terms
of input rate) is really important. If your use-case involves varying
input-rate for your sources, I would suggest picking a particular snapshot
of that source, and replay your experiments in order to check if the
variance in latency/throughput still exists.

The second point I would like to make is that sometimes throughput (or
ack-rate as you correctly put it) might be related to the data you are
pushing. For instance, a computation-heavy task might take more time for a
particular value distribution than for another. Therefore, please make sure
that the data you send in the system always cause the same amount of
computation.

And third, noticing dropping throughput and latency at the same time
immediately points to a dropped input rate. Think about it. If I send in
tuples with a lower input rate, I expect throughput to drop (since I am
sending tuples with a lower input rate), and at the same time the heavy
computation has to work with less data (thus end-to-end latency also
drops). Does the previous make sense to you? Can you verify that among the
different runs, you had consistent input rates?

Finally, I would suggest to you that you let Storm warm-up and drop your
initial metrics. In my experience with Storm, latency and throughput, in
the beginning of a task (until all buffers get full), are highly variable,
and therefore, not reliable data points to include in your analysis. You
can verify my claim by doing an overtime plot of your data.

Thanks,
Nikos

On Fri, Apr 1, 2016 at 9:16 AM, John Yost <hokiegeek2@gmail.com> wrote:

> Hi Everyone,
>
> I am a little puzzled by what I am seeing in some testing with a topology
> I have where the topo is reading from a KafkaSpout, doing some CPU
> intensive processing, and then writing out to Kafka via the standard
> KafkaBolt.
>
> I am doing testing in a multi-tenant environment and so test results can
> vary by 10-20% on average.  However, results are much more variable the
> last couple of days.
>
> The big thing I am noticing: whereas the throughput--as measured in tuples
> acked/minute--is half today of what it was yesterday for the same
> configuraton, the Complete Latency (total time a tuple is in the topology
> from the time it hits the KafkaSpout to the time it is acked in the
> KafkaBolt) today is a third of what it was yesterday.
>
> Any ideas as to how the throughput could go down dramatically at the same
> time the Complete Latency is improving?
>
> Thanks
>
> --John
>



-- 
Nikos R. Katsipoulakis,
Department of Computer Science
University of Pittsburgh

Mime
View raw message