storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsha <st...@harsha.io>
Subject Re: Urgent - Some workers stop processing after a few seconds
Date Thu, 26 Feb 2015 16:31:58 GMT

Martin, Can't find anything wrong in the logs or in your topologyBuilder
code. In your bolts code how are you doing the acking of the tuples.
You've maxSpout pending set to 2k tuples do you see any where in your
bolt code can be hanging before acking the tuple?.

-Harsha

On Wed, Feb 25, 2015, at 09:02 AM, Martin Illecker wrote:
> How can I find out why workers do not get any tuples? After they have
> successfully processed a few thousand.
>
> I have also tested the *allGrouping* to ensure that each Bolt must
> receive tuples. But two workers including two Bolts stop receiving
> tuples after a few seconds.
>
> I would appreciate any help!
>
>
>
> 2015-02-25 17:40 GMT+01:00 Harsha <storm@harsha.io>:
>> __
>> My bad was looking at another supervisor.log. There are no errors in
>> supervisor and worker logs.
>>
>>
>> -Harsha
>>
>>
>> On Wed, Feb 25, 2015, at 08:29 AM, Martin Illecker wrote:
>>> Hi Harsha,
>>>
>>> I'm using three c3.4xlarge EC2 instances: 1) Nimbus, WebUI,
>>> Zookeeper, Supervisor 2) Zookeeper, Supervisor 3) Zookeeper,
>>> Supervisor
>>>
>>> I cannot find this error message in my attached supervisor log? By
>>> the way, I'm running on Ubuntu EC2 nodes and there is no path C:\.
>>>
>>> I have not made any changes in these timeout values. Should be the
>>> default values: storm.zookeeper.session.timeout: 20000
>>> storm.zookeeper.connection.timeout: 15000
>>> supervisor.worker.timeout.secs: 30
>>>
>>> Thanks! Best regards Martin
>>>
>>>
>>> 2015-02-25 17:03 GMT+01:00 Harsha <storm@harsha.io>:
>>>> __
>>>> Hi Martin, Can you share your storm.zookeeper.session.timeout and
>>>> storm.zookeeper.connection.timeout and
>>>> supervisor.worker.timeout.secs. By looking at the supervisor logs I
>>>> see Error when processing event java.io.FileNotFoundException: File
>>>> 'c:\hdistorm\workers\f3e70029-c5c8-4f55-a4a1-396096b37509\heartbeats\1417082031858'
>>>> you might be running into
>>>> https://issues.apache.org/jira/browse/STORM-682 Is your zookeeper
>>>> cluster on a different set of nodes and can you check you are able
>>>> to connect to it without any issues -Harsha
>>>>
>>>>
>>>>
>>>> On Wed, Feb 25, 2015, at 03:49 AM, Martin Illecker wrote:
>>>>> Hi,
>>>>>
>>>>> I'm still observing this strange issue. Two of three workers stop
>>>>> processing after a few seconds. (each worker is running on one
>>>>> dedicated EC2 node)
>>>>>
>>>>> My guess would be that the output stream of one spout is not
>>>>> properly distributed over all three workers. Or somehow directed
>>>>> to one worker only? But *shuffleGrouping* should guarantee equal
>>>>> distribution among multiple bolts right?
>>>>>
>>>>> I'm using the following topology:
>>>>>
>>>>> TopologyBuilder builder = new TopologyBuilder();


>>>>> builder.setSpout("dataset-spout", spout);


>>>>> builder.setBolt("tokenizer-bolt", tokenizerBolt,
>>>>> 3).shuffleGrouping(


>>>>> "dataset-spout");


>>>>> builder.setBolt("preprocessor-bolt", preprocessorBolt,
>>>>> 3).shuffleGrouping(


>>>>> "tokenizer-bolt");


>>>>> conf.setMaxSpoutPending(2000);


>>>>> conf.setNumWorkers(3);


>>>>> StormSubmitter


>>>>> .submitTopology(TOPOLOGY_NAME, conf, builder.createTopology());


>>>>>
>>>>> I have attached the screenshots of the topology and the truncated
>>>>> worker and supervisor log of one idle worker.
>>>>>
>>>>> The supervisor log includes a few interesting lines, but I think
>>>>> they are normal? supervisor [INFO]
>>>>> e76bc338-2ba5-444b-9854-bca94f9587b7 still hasn't started


>>>>>
>>>>> I hope, someone can help me with this issue!
>>>>>
>>>>> Thanks Best regards Martin
>>>>>
>>>>>
>>>>> 2015-02-24 20:37 GMT+01:00 Martin Illecker <millecker@apache.org>:
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to run a topology on EC2, but I'm observing the
>>>>>> following strange issue:
>>>>>>
>>>>>> Some workers stop processing after a few seconds, without any
>>>>>> error in the worker log.
>>>>>>
>>>>>> For example, my topology consists of 3 workers and each worker is
>>>>>> running on its own EC2 node. Two of them stop processing after a
>>>>>> few seconds. But they have already processed several tuples
>>>>>> successfully.
>>>>>>
>>>>>> I'm using only one spout and shuffleGrouping at all bolts. If I
>>>>>> add more spouts then all workers keep processing, but the
>>>>>> performance is very bad.
>>>>>>
>>>>>> Does anyone have a guess why this happens?
>>>>>>
>>>>>> The topology is currently running at: http://54.155.156.203:8080
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Email had 4 attachments:


>>>>>  * topology.jpeg 161k (image/jpeg)
>>>>>  * component.jpeg 183k (image/jpeg)
>>>>>  * supervisor.log 7k (application/octet-stream)
>>>>>  * worker.log 37k (application/octet-stream)
>>>>
>>>
>>
>


Mime
View raw message