storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Illecker <millec...@apache.org>
Subject Re: Urgent - Some workers stop processing after a few seconds
Date Thu, 26 Feb 2015 17:30:41 GMT
Hi,

I believe this issue belongs to Storm or EC2 because on a single node (one
worker) my topology is operating fine.

I have tried different combinations of the following parameters:
 - *shuffleGrouping* and *allGrouping* between the spout and the first bolt
 - spout parallelism from 1 to numberOfWorkers (each worker has its own
spout task)
 - maxSpoutPending from 5000 down to 50
 - 1ms sleep in spout

The issue occurs when one spout with parallelism 1 should feed multiple
workers.
For example, 5 workers including one spout with parallelism 1 and a bolt
with parallelism 5.
After a few seconds, 4 of these 5 workers become idle and only one worker
keeps processing.
This might be probably the worker including the spout task.

If I increase the parallelism of the spout, then the performance drops
dramatically, but all workers keep working.

There are no error messages in the worker or supervisor log.

You've maxSpout pending set to 2k tuples do you see any where in your bolt
> code can be hanging before acking the tuple?.

I thought I would receive an exception or a timeout if the bolt is hanging?

Please have a look a the full source of my topology:
https://github.com/millecker/storm-apps/blob/master/sentiment_analysis_svm/src/at/illecker/storm/sentimentanalysis/svm/SentimentAnalysisSVMTopology.java

Thanks!








2015-02-26 17:31 GMT+01:00 Harsha <storm@harsha.io>:

>  Martin,
>          Can't find anything wrong in the logs or in your topologyBuilder
> code. In your bolts code how are you doing the acking of the tuples. You've
> maxSpout pending set to 2k tuples do you see any where in your bolt code
> can be hanging before acking the tuple?.
>
> -Harsha
>
> On Wed, Feb 25, 2015, at 09:02 AM, Martin Illecker wrote:
>
> How can I find out why workers do not get any tuples?
> After they have successfully processed a few thousand.
>
> I have also tested the *allGrouping* to ensure that each Bolt must
> receive tuples.
> But two workers including two Bolts stop receiving tuples after a few
> seconds.
>
> I would appreciate any help!
>
>
>
> 2015-02-25 17:40 GMT+01:00 Harsha <storm@harsha.io>:
>
>
> My bad was looking at another supervisor.log.  There are no errors in
> supervisor and worker logs.
>
>
> -Harsha
>
>
> On Wed, Feb 25, 2015, at 08:29 AM, Martin Illecker wrote:
>
> Hi Harsha,
>
> I'm using three c3.4xlarge EC2 instances:
>  1) Nimbus, WebUI, Zookeeper, Supervisor
>  2) Zookeeper, Supervisor
>  3) Zookeeper, Supervisor
>
> I cannot find this error message in my attached supervisor log?
> By the way, I'm running on Ubuntu EC2 nodes and there is no path C:\.
>
> I have not made any changes in these timeout values. Should be the default
> values:
> storm.zookeeper.session.timeout: 20000
> storm.zookeeper.connection.timeout: 15000
> supervisor.worker.timeout.secs: 30
>
> Thanks!
> Best regards
> Martin
>
>
> 2015-02-25 17:03 GMT+01:00 Harsha <storm@harsha.io>:
>
>
> Hi Martin,
>             Can you share your storm.zookeeper.session.timeout and
> storm.zookeeper.connection.timeout and supervisor.worker.timeout.secs. By
> looking at the supervisor logs I see
> Error when processing event
> java.io.FileNotFoundException: File
> 'c:\hdistorm\workers\f3e70029-c5c8-4f55-a4a1-396096b37509\heartbeats\1417082031858'
>
> you might be running into  https://issues.apache.org/jira/browse/STORM-682
> Is your zookeeper cluster on a different set of  nodes and can you check
> you are able to connect to it without any issues
> -Harsha
>
>
>
> On Wed, Feb 25, 2015, at 03:49 AM, Martin Illecker wrote:
>
> Hi,
>
> I'm still observing this strange issue.
> Two of three workers stop processing after a few seconds. (each worker is
> running on one dedicated EC2 node)
>
> My guess would be that the output stream of one spout is not properly
> distributed over all three workers.
> Or somehow directed to one worker only? But *shuffleGrouping* should
> guarantee equal distribution among multiple bolts right?
>
> I'm using the following topology:
>
>
> TopologyBuilder builder = new TopologyBuilder();
>
> builder.setSpout("dataset-spout", spout);
>
> builder.setBolt("tokenizer-bolt", tokenizerBolt, 3).shuffleGrouping(
>
> "dataset-spout");
>
> builder.setBolt("preprocessor-bolt", preprocessorBolt, 3).shuffleGrouping(
>
> "tokenizer-bolt");
>
> conf.setMaxSpoutPending(2000);
>
> conf.setNumWorkers(3);
>
>     StormSubmitter
>
>         .submitTopology(TOPOLOGY_NAME, conf, builder.createTopology());
>
> I have attached the screenshots of the topology and the truncated worker
> and supervisor log of one idle worker.
>
> The supervisor log includes a few interesting lines, but I think they are
> normal?
>
> supervisor [INFO] e76bc338-2ba5-444b-9854-bca94f9587b7 still hasn't started
>
> I hope, someone can help me with this issue!
>
> Thanks
> Best regards
> Martin
>
>
> 2015-02-24 20:37 GMT+01:00 Martin Illecker <millecker@apache.org>:
>
> Hi,
>
> I'm trying to run a topology on EC2, but I'm observing the following
> strange issue:
>
> Some workers stop processing after a few seconds, without any error in the
> worker log.
>
> For example, my topology consists of 3 workers and each worker is running
> on its own EC2 node.
> Two of them stop processing after a few seconds. But they have already
> processed several tuples successfully.
>
> I'm using only one spout and shuffleGrouping at all bolts.
> If I add more spouts then all workers keep processing, but the performance
> is very bad.
>
> Does anyone have a guess why this happens?
>
> The topology is currently running at:
> http://54.155.156.203:8080
>
> Thanks!
>
> Martin
>
>
>
>
>
>
> Email had 4 attachments:
>
>    - topology.jpeg
>      161k (image/jpeg)
>    - component.jpeg
>      183k (image/jpeg)
>    - supervisor.log
>      7k (application/octet-stream)
>    - worker.log
>      37k (application/octet-stream)
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message