spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ritesh Kumar Singh <riteshoneinamill...@gmail.com>
Subject Re: Spark On Yarn Issue: Initial job has not accepted any resources
Date Tue, 18 Nov 2014 08:13:34 GMT
Not sure how to solve this, but spotted these lines in the logs:

14/11/18 14:28:23 INFO YarnAllocationHandler: Container marked as
*failed*: container_1415961020140_0325_01_000002

14/11/18 14:28:38 INFO YarnAllocationHandler: Container marked as
*failed*: container_1415961020140_0325_01_000003

And the lines following it says its trying to allocate some space of
1408B but its failing to do so. You might want to look into that


On Tue, Nov 18, 2014 at 1:23 PM, LinCharlie <lin_qili@outlook.com> wrote:

> Hi All:
> I was submitting a spark_program.jar to `spark on yarn cluster` on a
> driver machine with yarn-client mode. Here is the spark-submit command I
> used:
>
> ./spark-submit --master yarn-client --class
> com.charlie.spark.grax.OldFollowersExample --queue dt_spark
> ~/script/spark-flume-test-0.1-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.1.jar
>
> The queue `dt_spark` was free, and the program was submitted succesfully
> and running on the cluster.  But on console, it showed repeatedly that:
>
> 14/11/18 15:11:48 WARN YarnClientClusterScheduler: Initial job has not
> accepted any resources; check your cluster UI to ensure that workers are
> registered and have sufficient memory
>
> Checked the cluster UI logs, I find no errors:
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/home/disk5/yarn/usercache/linqili/filecache/6957209742046754908/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.0.0-cdh4.2.1/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 14/11/18 14:28:16 INFO SecurityManager: Changing view acls to: hadoop,linqili
> 14/11/18 14:28:16 INFO SecurityManager: SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: Set(hadoop, linqili)
> 14/11/18 14:28:17 INFO Slf4jLogger: Slf4jLogger started
> 14/11/18 14:28:17 INFO Remoting: Starting remoting
> 14/11/18 14:28:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkYarnAM@longzhou-hdp3.lz.dscc:37187]
> 14/11/18 14:28:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkYarnAM@longzhou-hdp3.lz.dscc:37187]
> 14/11/18 14:28:17 INFO ExecutorLauncher: ApplicationAttemptId: appattempt_1415961020140_0325_000001
> 14/11/18 14:28:17 INFO ExecutorLauncher: Connecting to ResourceManager at longzhou-hdpnn.lz.dscc/192.168.19.107:12032
> 14/11/18 14:28:17 INFO ExecutorLauncher: Registering the ApplicationMaster
> 14/11/18 14:28:18 INFO ExecutorLauncher: Waiting for spark driver to be reachable.
> 14/11/18 14:28:18 INFO ExecutorLauncher: Master now available: 192.168.59.90:36691
> 14/11/18 14:28:18 INFO ExecutorLauncher: Listen to driver: akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler
> 14/11/18 <http://spark@192.168.59.90:36691/user/CoarseGrainedScheduler14/11/18>
14:28:18 INFO ExecutorLauncher: Allocating 1 executors.
> 14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor containers with 1408
of memory each.
> 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor containers with 1408
of memory each.
> 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:18 INFO RackResolver: Resolved longzhou-hdp3.lz.dscc to /rack1
> 14/11/18 14:28:18 INFO YarnAllocationHandler: launching container on container_1415961020140_0325_01_000002
host longzhou-hdp3.lz.dscc
> 14/11/18 14:28:18 INFO ExecutorRunnable: Starting Executor Container
> 14/11/18 14:28:18 INFO ExecutorRunnable: Connecting to ContainerManager at longzhou-hdp3.lz.dscc:12040
> 14/11/18 14:28:18 INFO ExecutorRunnable: Setting up ContainerLaunchContext
> 14/11/18 14:28:18 INFO ExecutorRunnable: Preparing Local resources
> 14/11/18 14:28:18 INFO ExecutorLauncher: All executors have launched.
> 14/11/18 14:28:18 INFO ExecutorLauncher: Started progress reporter thread - sleep time
: 5000
> 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
0, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:18 INFO ExecutorRunnable: Prepared Local resources Map(__spark__.jar ->
resource {, scheme: "hdfs", host: "longzhou-hdpnn.lz.dscc", port: 11000, file: "/user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar",
}, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: PRIVATE, )
> 14/11/18 14:28:18 INFO ExecutorRunnable: Setting up executor with commands: List($JAVA_HOME/bin/java,
-server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m -Xmx1024m , -Djava.security.krb5.conf=/home/linqili/proc/spark_client/hadoop/kerberos5-client/etc/krb5.conf
-Djava.library.path=/home/linqili/proc/spark_client/hadoop/lib/native/Linux-amd64-64, -Djava.io.tmpdir=$PWD/tmp,
 -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.executor.CoarseGrainedExecutorBackend,
akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler, 1, longzhou-hdp3.lz.dscc,
3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
> 14/11/18 14:28:23 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
0, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:23 INFO YarnAllocationHandler: Completed container container_1415961020140_0325_01_000002
(state: COMPLETE, exit status: 1)
> 14/11/18 14:28:23 INFO YarnAllocationHandler: Container marked as failed: container_1415961020140_0325_01_000002
> 14/11/18 14:28:28 INFO ExecutorLauncher: Allocating 1 containers to make up for (potentially
?) lost containers
> 14/11/18 14:28:28 INFO YarnAllocationHandler: Allocating 1 executor containers with 1408
of memory each.
> 14/11/18 14:28:28 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:33 INFO ExecutorLauncher: Allocating 1 containers to make up for (potentially
?) lost containers
> 14/11/18 14:28:33 INFO YarnAllocationHandler: Allocating 1 executor containers with 1408
of memory each.
> 14/11/18 14:28:33 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:33 INFO RackResolver: Resolved longzhou-hdp2.lz.dscc to /rack1
> 14/11/18 14:28:33 INFO YarnAllocationHandler: launching container on container_1415961020140_0325_01_000003
host longzhou-hdp2.lz.dscc
> 14/11/18 14:28:33 INFO ExecutorRunnable: Starting Executor Container
> 14/11/18 14:28:33 INFO ExecutorRunnable: Connecting to ContainerManager at longzhou-hdp2.lz.dscc:12040
> 14/11/18 14:28:33 INFO ExecutorRunnable: Setting up ContainerLaunchContext
> 14/11/18 14:28:33 INFO ExecutorRunnable: Preparing Local resources
> 14/11/18 14:28:33 INFO ExecutorRunnable: Prepared Local resources Map(__spark__.jar ->
resource {, scheme: "hdfs", host: "longzhou-hdpnn.lz.dscc", port: 11000, file: "/user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar",
}, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: PRIVATE, )
> 14/11/18 14:28:33 INFO ExecutorRunnable: Setting up executor with commands: List($JAVA_HOME/bin/java,
-server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m -Xmx1024m , -Djava.security.krb5.conf=/home/linqili/proc/spark_client/hadoop/kerberos5-client/etc/krb5.conf
-Djava.library.path=/home/linqili/proc/spark_client/hadoop/lib/native/Linux-amd64-64, -Djava.io.tmpdir=$PWD/tmp,
 -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.executor.CoarseGrainedExecutorBackend,
akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler, 2, longzhou-hdp2.lz.dscc,
3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
> 14/11/18 14:28:38 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
0, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:38 INFO YarnAllocationHandler: Ignoring container container_1415961020140_0325_01_000004
at host longzhou-hdp2.lz.dscc, since we already have the required number of containers for
it.
> 14/11/18 14:28:38 INFO YarnAllocationHandler: Completed container container_1415961020140_0325_01_000003
(state: COMPLETE, exit status: 1)
> 14/11/18 14:28:38 INFO YarnAllocationHandler: Container marked as failed: container_1415961020140_0325_01_000003
> 14/11/18 14:28:43 INFO ExecutorLauncher: Allocating 1 containers to make up for (potentially
?) lost containers
> 14/11/18 14:28:43 INFO YarnAllocationHandler: Releasing 1 containers. pendingReleaseContainers
: {container_1415961020140_0325_01_000004=true}
> 14/11/18 14:28:43 INFO YarnAllocationHandler: Allocating 1 executor containers with 1408
of memory each.
> 14/11/18 14:28:43 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:48 INFO ExecutorLauncher: Allocating 1 containers to make up for (potentially
?) lost containers
> 14/11/18 14:28:48 INFO YarnAllocationHandler: Allocating 1 executor containers with 1408
of memory each.
> 14/11/18 14:28:48 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:48 INFO YarnAllocationHandler: launching container on container_1415961020140_0325_01_000005
host longzhou-hdp2.lz.dscc
> 14/11/18 14:28:48 INFO ExecutorRunnable: Starting Executor Container
> 14/11/18 14:28:48 INFO ExecutorRunnable: Connecting to ContainerManager at longzhou-hdp2.lz.dscc:12040
> 14/11/18 14:28:48 INFO ExecutorRunnable: Setting up ContainerLaunchContext
> 14/11/18 14:28:48 INFO ExecutorRunnable: Preparing Local resources
> 14/11/18 14:28:48 INFO ExecutorRunnable: Prepared Local resources Map(__spark__.jar ->
resource {, scheme: "hdfs", host: "longzhou-hdpnn.lz.dscc", port: 11000, file: "/user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar",
}, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: PRIVATE, )
> 14/11/18 14:28:48 INFO ExecutorRunnable: Setting up executor with commands: List($JAVA_HOME/bin/java,
-server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m -Xmx1024m , -Djava.security.krb5.conf=/home/linqili/proc/spark_client/hadoop/kerberos5-client/etc/krb5.conf
-Djava.library.path=/home/linqili/proc/spark_client/hadoop/lib/native/Linux-amd64-64, -Djava.io.tmpdir=$PWD/tmp,
 -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.executor.CoarseGrainedExecutorBackend,
akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler, 3, longzhou-hdp2.lz.dscc,
3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
> 14/11/18 14:28:53 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers:
0, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:53 INFO YarnAllocationHandler: Ignoring container container_1415961020140_0325_01_000006
at host longzhou-hdp2.lz.dscc, since we already have the required number of containers for
it.
>
>
> Is there any hint? Thanks.
>
>

Mime
View raw message