flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Flink metrics related problems/questions
Date Mon, 22 May 2017 12:17:01 GMT
@Chesnay With timers it will happen that onTimer() is called from a different Thread than the
Tread that is calling processElement(). If Metrics updates happen in both, would that be a
problem?

> On 19. May 2017, at 11:57, Chesnay Schepler <chesnay@apache.org> wrote:
> 
> 2. isn't quite accurate actually; metrics on the TaskManager are not persisted across
restarts.
> 
> On 19.05.2017 11:21, Chesnay Schepler wrote:
>> 1. This shouldn't happen. Do you access the counter from different threads?
>> 
>> 2. Metrics in general are not persisted across restarts, and there is no way to configure
flink to do so at the moment.
>> 
>> 3. Counters are sent as gauges since as far as I know StatsD counters are not allowed
to be decremented.
>> 
>> On 19.05.2017 08:56, jaxbihani wrote:
>>> Background: We are using a job using ProcessFunction which reads data from
>>> kafka fires ~5-10K timers per second and sends matched events to KafkaSink.
>>> We are collecting metrics for collecting no of active timers, no of timers
>>> scheduled etc. We use statsd reporter and monitor using Grafana dashboard &
>>> RocksDBStateBackend backed by HDFS as state.
>>> 
>>> Observations/Problems:
>>> 1. *Counter value suddenly got reset:*  While job was running fine, on one
>>> fine moment, metric of a monotonically increasing counter (Counter where we
>>> just used inc() operation) suddenly became 0 and then resumed from there
>>> onwards. Only exception in the logs were related to transient connectivity
>>> issues to datanodes. Also there was no other indicator of any failure
>>> observed after inspecting system metrics/checkpoint metrics.  It happened
>>> just once across multiple runs of a same job.
>>> 2. *Counters not retained during flink restart with savepoint*: Cancelled
>>> job with -s option taking savepoint and then restarted the job using the
>>> savepoint.  After restart metrics started from 0. I was expecting metric
>>> value of a given operator would also be part of state.
>>> 3. *Counter metrics getting sent as Gauge*: Using tcpdump I was inspecting
>>> the format in which metric are sent to statsd. I observed that even the
>>> metric which in my code were counters, were sent as gauges. I didn't get why
>>> that was so.
>>> 
>>> Can anyone please add more insights into why above mentioned behaviors would
>>> have happened?
>>> Also does flink store metric values as a part of state for stateful
>>> operators? Is there any way to configure that?
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-metrics-related-problems-questions-tp13218.html
>>> Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.com.
>>> 
>> 
>> 
> 


Mime
View raw message