storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Agarwal <abhishc...@gmail.com>
Subject Re: Storm Metrics Consumer Not Receiving Tuples
Date Fri, 15 Apr 2016 17:31:16 GMT
Kevin,
That would explain it. A stuck bolt will stall the whole topology.
MetricConsumer runs as a bolt so it will be blocked as well

Excuse typos
On Apr 15, 2016 10:29 PM, "Kevin Conaway" <kevin.a.conaway@gmail.com> wrote:

> Two more data points on this:
>
> 1.) We are registering the graphite MetricsConsumer on our Topology
> Config, not globally in storm.yaml.  I don't know if this makes a
> difference.
>
> 2.) We re-ran another test last night and it ran fine for about 6 hours
> until the Kafka brokers ran out of disk space (oops) which halted the
> test.  This exact time also coincided with when the Graphite instance
> stopped receiving metrics from Storm.  Given that we weren't processing any
> tuples while storm was down, I understand why we didn't get those metrics
> but shouldn't the __system metrics (like heap size, gc time) still have
> been sent?
>
> On Thu, Apr 14, 2016 at 10:09 PM, Kevin Conaway <kevin.a.conaway@gmail.com
> > wrote:
>
>> Thank you for taking the time to respond.
>>
>> In my bolt I am registering 3 custom metrics (each a ReducedMetric to
>> track the latency of individual operations in the bolt).  The metric
>> interval for each is the same as TOPOLOGY_BUILTIN_METRICS_BUCKET_SIZE_SECS
>> which we have set at 60s
>>
>> The topology did not hang completely but it did degrade severely.
>> Without metrics it was hard to tell but it looked like some of the tasks
>> for certain kafka partitions either stopped emitting tuples or never got
>> acknowledgements for the tuples they did emit.  Some tuples were definitely
>> making it through though because data was continuously being inserted in to
>> Cassandra.  After I killed and resubmitted the topology, there were still
>> messages left over in the topic but only for certain partitions.
>>
>> What queue configuration are you looking for?
>>
>> I don't believe that the case was that the graphite metrics consumer
>> wasn't "keeping up".  In storm UI, the processing latency was very low for
>> that pseudo-bolt, as was the capacity.  Storm UI just showed that no tuples
>> were being delivered to the bolt.
>>
>> Thanks!
>>
>> On Thu, Apr 14, 2016 at 9:00 PM, Jungtaek Lim <kabhwan@gmail.com> wrote:
>>
>>> Kevin,
>>>
>>> Do you register custom metrics? If then how long / vary is their
>>> intervals?
>>> Did your topology not working completely? (I mean did all tuples become
>>> failing after that time?)
>>> And could you share your queue configuration?
>>>
>>> And you can replace storm-graphite to LoggingMetricsConsumer and see it
>>> helps. If changing consumer resolves the issue, we can guess storm-graphite
>>> cannot keep up the metrics.
>>>
>>> Btw, I'm addressing metrics consumer issues (asynchronous, filter).
>>> You can track the progress here:
>>> https://issues.apache.org/jira/browse/STORM-1699
>>>
>>> I'm afraid they may be not ported to 0.10.x, but asynchronous metrics
>>> consumer bolt <https://issues.apache.org/jira/browse/STORM-1698> is a
>>> simple patch so you can apply and build custom 0.10.0, and give it a try.
>>>
>>> Hope this helps.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>>
>>> 2016년 4월 14일 (목) 오후 11:06, Denis DEBARBIEUX <ddebarbieux@norsys.fr>님이
>>> 작성:
>>>
>>>> Hi Kevin,
>>>>
>>>> I have a similar issue with storm 0.9.6 (see the following topic
>>>> https://mail-archives.apache.org/mod_mbox/storm-user/201603.mbox/browser
>>>> ).
>>>>
>>>> It is still open. So, please, keep me informed on your progress.
>>>>
>>>> Denis
>>>>
>>>>
>>>> Le 14/04/2016 15:54, Kevin Conaway a écrit :
>>>>
>>>> We are using Storm 0.10 with the following configuration:
>>>>
>>>>    - 1 Nimbus node
>>>>    - 6 Supervisor nodes, each with 2 worker slots.  Each supervisor
>>>>    has 8 cores.
>>>>
>>>>
>>>> Our topology has a KafkaSpout that forwards to a bolt where we
>>>> transform the message and insert it in to Cassandra.  Our topic has 50
>>>> partitions so we have configured the number of executors/tasks for the
>>>> KafkaSpout to be 50.  Our bolt has 150 executors/tasks.
>>>>
>>>> We have also added the storm-graphite metrics consumer (
>>>> <https://github.com/verisign/storm-graphite>
>>>> https://github.com/verisign/storm-graphite) to our topology so that
>>>> storms metrics are sent to our graphite cluster.
>>>>
>>>> Yesterday we were running a 2000 tuple/sec load test and everything was
>>>> fine for a few hours until we noticed that we were no longer receiving
>>>> metrics from Storm in graphite.
>>>>
>>>> I verified that its not a connectivity issue between the Storm and
>>>> Graphite.  Looking in Storm UI,
>>>> the __metricscom.verisign.storm.metrics.GraphiteMetricsConsumer hadn't
>>>> received a single tuple in the prior 10 minute or 3 hour window.
>>>>
>>>> Since the metrics consumer bolt was assigned to one executor, I took
>>>> thread dumps of that JVM.  I saw the following stack trace for the metrics
>>>> consumer thread:
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> [image: Avast logo]
>>>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>>>
>>>> L'absence de virus dans ce courrier électronique a été vérifiée par
le
>>>> logiciel antivirus Avast.
>>>> www.avast.com
>>>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>>>
>>>>
>>
>>
>> --
>> Kevin Conaway
>> http://www.linkedin.com/pub/kevin-conaway/7/107/580/
>> https://github.com/kevinconaway
>>
>
>
>
> --
> Kevin Conaway
> http://www.linkedin.com/pub/kevin-conaway/7/107/580/
> https://github.com/kevinconaway
>

Mime
View raw message