flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1952) Cannot run ConnectedComponents example: Could not allocate a slot on instance
Date Tue, 26 May 2015 22:02:17 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559961#comment-14559961
] 

ASF GitHub Bot commented on FLINK-1952:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/731#issuecomment-105679811
  
    The latest commit "Add big not so mini cluster test for CC to provoke scheduler problem"
is not going to be merged. It is solely for verifying that the scheduler now correctly handles
jobs with iterations and higher parallelism and many TaskManagers.
    
    You can start a 100 TaskManager cluster with that test setup in the commit and run connected
components. (Give the VM 5GB heap space, then it works smoothly).


> Cannot run ConnectedComponents example: Could not allocate a slot on instance
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-1952
>                 URL: https://issues.apache.org/jira/browse/FLINK-1952
>             Project: Flink
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Priority: Blocker
>
> Steps to reproduce
> {code}
> ./bin/yarn-session.sh -n 350 
> {code}
> ... wait until they are connected ...
> {code}
> Number of connected TaskManagers changed to 266. Slots available: 266
> Number of connected TaskManagers changed to 323. Slots available: 323
> Number of connected TaskManagers changed to 334. Slots available: 334
> Number of connected TaskManagers changed to 343. Slots available: 343
> Number of connected TaskManagers changed to 350. Slots available: 350
> {code}
> Start CC
> {code}
> ./bin/flink run -p 350 ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar
> {code}
> ---> it runs
> Run KMeans, let it fail with 
> {code}
> Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - execution #0 to
slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 - ALLOCATED/ALIVE: java.io.IOException:
Insufficient number of network buffers: required 350, but only 254 available. The total number
of network buffers is currently set to 2048. You can increase this number by setting the configuration
key 'taskmanager.network.numberOfBuffers'.
> {code}
> ... as expected.
> (I've waited for 10 minutes between the two submissions)
> Starting CC now will fail:
> {code}
> ./bin/flink run -p 350 ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar

> {code}
> Error message(s):
> {code}
> Caused by: java.lang.IllegalStateException: Could not schedule consumer vertex IterationHead(WorksetIteration
(Unnamed Delta Iteration)) (19/350)
> 	at org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479)
> 	at org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469)
> 	at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
> 	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> 	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> 	at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> 	... 4 more
> Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ cloud-19 - 1 slots
- URL: akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the co-location
constraint.
> 	at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247)
> 	at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110)
> 	at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262)
> 	at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436)
> 	at org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475)
> 	... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message