flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Metzger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1952) Cannot run ConnectedComponents example: Could not allocate a slot on instance
Date Tue, 28 Apr 2015 11:08:06 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516818#comment-14516818
] 

Robert Metzger commented on FLINK-1952:
---------------------------------------

Also, I saw this error, maybe its helpful
{code}
Could not prepare the execution graph org.apache.flink.runtime.executiongraph.ExecutionGraph@17646800
for archiving.
java.lang.IllegalStateException: SlotSharingGroup cannot clear task assignment, group still
has allocated resources.
	at org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroup.clearTaskAssignment(SlotSharingGroup.java:76)
	at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.prepareForArchiving(ExecutionJobVertex.java:400)
	at org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:771)
	at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$removeJob(JobManager.scala:675)
	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:329)
	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
	at org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:99)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
	at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
	at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
	at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
	at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
	at akka.actor.ActorCell.invoke(ActorCell.scala:487)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
	at akka.dispatch.Mailbox.run(Mailbox.scala:221)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{code}

> Cannot run ConnectedComponents example: Could not allocate a slot on instance
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-1952
>                 URL: https://issues.apache.org/jira/browse/FLINK-1952
>             Project: Flink
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>
> Steps to reproduce
> {code}
> ./bin/yarn-session.sh -n 350 
> {code}
> ... wait until they are connected ...
> {code}
> Number of connected TaskManagers changed to 266. Slots available: 266
> Number of connected TaskManagers changed to 323. Slots available: 323
> Number of connected TaskManagers changed to 334. Slots available: 334
> Number of connected TaskManagers changed to 343. Slots available: 343
> Number of connected TaskManagers changed to 350. Slots available: 350
> {code}
> Start CC
> {code}
> ./bin/flink run -p 350 ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar
> {code}
> ---> it runs
> Run KMeans, let it fail with 
> {code}
> Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - execution #0 to
slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 - ALLOCATED/ALIVE: java.io.IOException:
Insufficient number of network buffers: required 350, but only 254 available. The total number
of network buffers is currently set to 2048. You can increase this number by setting the configuration
key 'taskmanager.network.numberOfBuffers'.
> {code}
> ... as expected.
> (I've waited for 10 minutes between the two submissions)
> Starting CC now will fail:
> {code}
> ./bin/flink run -p 350 ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar

> {code}
> Error message(s):
> {code}
> Caused by: java.lang.IllegalStateException: Could not schedule consumer vertex IterationHead(WorksetIteration
(Unnamed Delta Iteration)) (19/350)
> 	at org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479)
> 	at org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469)
> 	at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
> 	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> 	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> 	at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> 	... 4 more
> Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ cloud-19 - 1 slots
- URL: akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the co-location
constraint.
> 	at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247)
> 	at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110)
> 	at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262)
> 	at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436)
> 	at org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475)
> 	... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message