storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <ober...@civicscience.com>
Subject ShellSpout hangs on reportError?
Date Fri, 06 Feb 2015 14:48:37 GMT
Hi,

For reference, I'm talking about 0.9.3 ShellSpout, line 234.

I'll try to cover the important facts that led to this issue:

-I was on 0.9.2 using multilang to bridge to PHP to get to some existing
business logic

-I'm testing the 0.9.3 upgrade (yes, I see the new heartbeat addition to
the ShellBolt protocol)

-I have some odd topologies where I try to do some legacy background
processing.  This processing takes a highly variable amount time in the
Bolts, from milliseconds to minutes.  But, eventually due to randomness the
spout's "pending" pool fills up, causing the spout to block on nextTuple,
which eventually causes a heartbeat timeout. (I believe my only fix is to
increase the heartbeat timeout at the topology level. that's not the
purpose of this email, though confirmation of this as my only workaround
would be appreciated!  I feel like this wasn't anticipated when the
heartbeat patch was designed, as it was assumed the spout's nextTuple
wouldn't block I guess?)

-The purpose of this email is the fact that the topology "jams up" when the
ShellSpout has a heartbeat timeout.  I can see my PHP spout/bolt still
running (I added logging to them), but Storm itself is doing nothing.

-I added logging to ShellSpout and recompiled, because I saw the log
message on like 233 (Halting process: ShellSpout died) but as noted the PHP
process was still running, so I was curious if _process.destroy(); failed.
But, my logging didn't appear.  I assumed I was compiling/deploying wrong.
Eventually I commented out line 234: _collector.reportError(exception);
 and everything started working!!!

Does this make *any* sense?  Why would _collector.reportError(exception);
block and never return (I waited quite a long time, 10's of minutes).  When
I comment out line 234, Storm immediately kills my bad tasks and respawns
almost instantly.

I feel fairly confident that this will be recreatable.  My topology:
-1 spout (ShellSpout)
-1 bolt (ShellBolt)
-The ShellSpout has a heartbeat timeout due to slow tasks in ShellBolt +
the pending queue is full

Thanks for any feedback!

will

Mime
View raw message