Thanks for the quick response. According to storm documentation, if a worker/node dies it's automatically restarted. Also, the bolts still show up in storm ui. They just don't seem to be processing any data. The link you mentioned could have been of great help but we're stuck on an old version right now which doesn't have those features and upgrading is not an option.
What could be other possible reasons for a bolt to completely hang while the rest of topology works fine?
On a side note, having 8 bolts seems like a rather complicated situation. This is if it is Spout ---> Bolt1 ---> Bolt2 ---> Bolt3 ---->and so on ---> Bolt8. Takes too long for an ack. Design change recommended.The last time I encountered crashes that left no error messages, was when the OS killed a process that took up too much processing power. This gets worse on Ubuntu systems, where there is no log registered about the OOM killer even in the system logs.For debugging Storm, there are these options: https://community.hortonworks.com/articles/36151/debugging-an-apache-storm-topology.htmlOn Thu, Aug 4, 2016 at 2:35 PM, Abhishek Raj <firstname.lastname@example.org> wrote:Hi.We are using storm 0.9.4. Our topology consists of a linear chain of 1 spout and 8 bolts. In the 4th bolt we call an external bolt written in php which emits to 5th bolt after some processing.We are seeing that after some time, the 6th, 7th and 8th bolt completely stop processing. The executed, acked, emitted and transferred numbers drop to zero for these bolts and there is no error messages in the worker logs. Other bolts still seem to be processing data and emitting but the last 3 bolts completely halt and do no processing. The failed count keeps increasing on the kafka spout, but the failed count of the individual bolts still remains 0.We already tried increasing tuple timeout threshold and decreasing max-spout-pending to no avail. Eventually, the bolts completely stopped processing. We are not really sure if it has something to do with the external php bolt that we call because it still seems to be processing data fine and sends heartbeat.Any pointers about how to go about debugging this would be great.--Abhishek