storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Cooper (PGR)" <t.coo...@newcastle.ac.uk>
Subject Complete latency calculation
Date Tue, 31 Jan 2017 15:55:36 GMT
Hi,


First a bit of background:


I am a PhD student working on modelling the performance of Storm topologies. I am having reasonable
success in modelling the complete latency, however depending on the load (throughput) I can
be up to 50% out.


After exhausting all other sources of possible latency (mostly remote transfer delays between
workers on separate machines) it seems the final path of ack tuples from the end component
to the Acker and through to the Spout is the last source of unknown latency. Under heavy load
and a relatively low number of spout tasks, each task will be busy calling next_tuple and
acking, so ack complete messages may back up at the spout. This will artificially extend the
complete latency. As the spout does not report metrics for the delay/processing of acks, I
cannot account for this effect in my models.


I thought I might have to resort to implementing my own spout (a custom Storm fork is something
I would prefer to avoid). However, after seeing issue 1742 (https://issues.apache.org/jira/browse/STORM-1742)
it seems Jungtaek Lim and the Storm devs have already spotted this problem and implemented
a solution in the master and 1.x branches. Having the Ackers stop the complete latency clock
makes more sense (particularly under heavy load) and makes the complete latency match more
closely that of the sojourn time (spout to final component) through the whole topology.


However, I was hoping to get these models working with the latest storm release (1.0.2). It
doesn't appear that these changes have been backported to the 1.0.x branch yet?


My Question (TL;DR):


Where in the 1.0.x codebase does the ack_ack message to the spout tasks get processed? I know
that implementations of ISpout have an ack() method that gets called. However, in my test
topologies when I leave this method unimplemented the system still reports a complete latency
for that spout? The timestamp in the ack_ack message must be getting processes somewhere,
but I am struggling to identify where.


Any help locating this would be most appreciated.


Regards,


Thomas Cooper
PhD Student
Newcastle University, School of Computer Science
W: http://www.tomcooper.org.uk | A: 4th Floor, The Core, Science Central, Bath Lane, Newcastle
upon Tyne, NE4 5TF

Mime
View raw message