flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chesnay Schepler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5072) MetricFetcher Ask Timeout
Date Tue, 15 Nov 2016 15:23:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667421#comment-15667421
] 

Chesnay Schepler commented on FLINK-5072:
-----------------------------------------

hmm. Well it isn't something critical if the MetricFetcher request times out; it will simply
not update the metrics in the web interface and will try again 10 seconds later if required.

However, the MetricQueryService is a separate actor that, if a job is fully running, only
receives a request from the fetcher. I would think that it should be able to serve that request
within the 10 second timeout. But frankly, i don't know a lot about the network conditions
under heavy load.

> MetricFetcher Ask Timeout
> -------------------------
>
>                 Key: FLINK-5072
>                 URL: https://issues.apache.org/jira/browse/FLINK-5072
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Ufuk Celebi
>
> Running a large scale test with 1.2-SNAPSHOT and heavy load on the TMs, I encountered
a lot of ask timeouts for the metric fetcher:
> {code}
> akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink@10.240.0.52:34471/user/MetricQueryService_container_1479207428252_0014_01_000026]]
after [10000 ms]
> 	at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
> 	at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
> 	at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
> 	at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
> 	at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
> 	at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
> 	at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
> 	at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> [~zentol] Does it make sense to investigate this further?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message