spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Belgodere <bbelgod...@gmail.com>
Subject Spark 1.20 resource issue with Mesos .21.1
Date Mon, 19 Jan 2015 14:40:08 GMT
Hi All,
I'm running into a weird issue with my test mesos cluster, I have a 3
master / 3 slave HA configuration. Marathon and Chronos are working as they
should and I can deploy dockerized applications to the slave nodes without
issue using Marathon. I downloaded Spark 1.2 and built from source.
Standalone mode works correctly but when I attempt to submit jobs to the
Mesos Cluster from Spark, it connects and shows up as a framework but I get
"Initial job has not accepted any resources; check your cluster UI to
ensure that workers are registered and have sufficient memory". I have
appended the relevant info believe below and I appreciate any help with
this. I've tried this in both coarse and fine grain and I get the same
result.

-Brian

I'm running on ubuntu trusty 64

my spark-env.sh contains

export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=http://192.0.3.11:8081/spark-1.2.0.tgz
export MASTER=mesos://zk://192.0.3.11:2181,192.0.3.12:2181,
192.0.3.13:2181/mesos
export SPARK_WORKER_MEMORY=512M
export SPARK_WORKER_CORES=1
export SPARK_LOCAL_IP=192.0.3.11

My Mesos Cluster sees

*Cluster*:
Mesos_Cluster
*Server*:
192.0.3.12:5050
*Version*:
0.21.1
*Built*:
a week ago by root
*Started*:
2 hours ago
*Elected*:
2 hours ago

*Resources*


*CPUs*

*Mem*

*Total*

3

2.9 GB

*Used*

0

0 B

*Offered*

0

0 B

*Idle*

3

2.9 GB


In the Spark Log I see

vagrant@master1:~/spark-1.2.0$ ./bin/run-example SparkPi 3
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/01/19 02:41:40 INFO SecurityManager: Changing view acls to: vagrant
15/01/19 02:41:40 INFO SecurityManager: Changing modify acls to: vagrant
15/01/19 02:41:40 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(vagrant);
users with modify permissions: Set(vagrant)
15/01/19 02:41:41 INFO Slf4jLogger: Slf4jLogger started
15/01/19 02:41:41 INFO Remoting: Starting remoting
15/01/19 02:41:42 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@master1:56626]
15/01/19 02:41:42 INFO Utils: Successfully started service 'sparkDriver' on
port 56626.
15/01/19 02:41:42 INFO SparkEnv: Registering MapOutputTracker
15/01/19 02:41:42 INFO SparkEnv: Registering BlockManagerMaster
15/01/19 02:41:42 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20150119024142-16af
15/01/19 02:41:42 INFO MemoryStore: MemoryStore started with capacity 267.3
MB
15/01/19 02:41:42 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-80342d7e-780f-4550-933d-adce88265322
15/01/19 02:41:42 INFO HttpServer: Starting HTTP Server
15/01/19 02:41:42 INFO Utils: Successfully started service 'HTTP file
server' on port 36273.
15/01/19 02:41:43 INFO Utils: Successfully started service 'SparkUI' on
port 4040.
15/01/19 02:41:43 INFO SparkUI: Started SparkUI at http://master1:4040
15/01/19 02:41:43 INFO SparkContext: Added JAR
file:/home/vagrant/spark-1.2.0/examples/target/scala-2.10/spark-examples-1.2.0-hadoop1.0.4.jar
at http://192.0.3.11:36273/jars/spark-examples-1.2.0-hadoop1.0.4.jar with
timestamp 1421635303639
2015-01-19 02:41:44,069:19208(0x7f7da54b3700):ZOO_INFO@log_env@712: Client
environment:zookeeper.version=zookeeper C client 3.4.5
2015-01-19 02:41:44,070:19208(0x7f7da54b3700):ZOO_INFO@log_env@716: Client
environment:host.name=master1
2015-01-19 02:41:44,070:19208(0x7f7da54b3700):ZOO_INFO@log_env@723: Client
environment:os.name=Linux
2015-01-19 02:41:44,071:19208(0x7f7da54b3700):ZOO_INFO@log_env@724: Client
environment:os.arch=3.13.0-43-generic
2015-01-19 02:41:44,071:19208(0x7f7da54b3700):ZOO_INFO@log_env@725: Client
environment:os.version=#72-Ubuntu SMP Mon Dec 8 19:35:06 UTC 2014
2015-01-19 02:41:44,072:19208(0x7f7da54b3700):ZOO_INFO@log_env@733: Client
environment:user.name=vagrant
2015-01-19 02:41:44,072:19208(0x7f7da54b3700):ZOO_INFO@log_env@741: Client
environment:user.home=/home/vagrant
2015-01-19 02:41:44,073:19208(0x7f7da54b3700):ZOO_INFO@log_env@753: Client
environment:user.dir=/home/vagrant/spark-1.2.0
2015-01-19 02:41:44,073:19208(0x7f7da54b3700):ZOO_INFO@zookeeper_init@786:
Initiating client connection, host=192.0.3.11:2181,192.0.3.12:2181,
192.0.3.13:2181sessionTimeout=10000 watcher=0x7f7daa4516a0 sessionId=0
sessionPasswd=<null> context=0xcf0a60 flags=0
2015-01-19 02:41:44,077:19208(0x7f7da3cb0700):ZOO_INFO@check_events@1703:
initiated connection to server [192.0.3.13:2181]
2015-01-19 02:41:44,080:19208(0x7f7da3cb0700):ZOO_INFO@check_events@1750:
session establishment complete on server [192.0.3.13:2181],
sessionId=0x34aff9e627f000e, negotiated timeout=10000
I0119 02:41:44.082293 19313 sched.cpp:137] Version: 0.21.1
I0119 02:41:44.088546 19315 group.cpp:313] Group process (group(1)@
192.0.3.11:50317) connected to ZooKeeper
I0119 02:41:44.088948 19315 group.cpp:790] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
I0119 02:41:44.089274 19315 group.cpp:385] Trying to create path '/mesos'
in ZooKeeper
I0119 02:41:44.112208 19320 detector.cpp:138] Detected a new leader:
(id='2')
I0119 02:41:44.113049 19315 group.cpp:659] Trying to get
'/mesos/info_0000000002' in ZooKeeper
I0119 02:41:44.115067 19316 detector.cpp:433] A new leading master (UPID=
master@192.0.3.12:5050) is detected
I0119 02:41:44.118728 19317 sched.cpp:234] New master detected at
master@192.0.3.12:5050
I0119 02:41:44.119282 19317 sched.cpp:242] No credentials provided.
Attempting to register without authentication
I0119 02:41:44.123064 19317 sched.cpp:408] Framework registered with
20150119-003609-201523392-5050-7198-0002
15/01/19 02:41:44 INFO MesosSchedulerBackend: Registered as framework ID
20150119-003609-201523392-5050-7198-0002
15/01/19 02:41:44 INFO NettyBlockTransferService: Server created on 54462
15/01/19 02:41:44 INFO BlockManagerMaster: Trying to register BlockManager
15/01/19 02:41:44 INFO BlockManagerMasterActor: Registering block manager
master1:54462 with 267.3 MB RAM, BlockManagerId(<driver>, master1, 54462)
15/01/19 02:41:44 INFO BlockManagerMaster: Registered BlockManager
15/01/19 02:41:44 INFO SparkContext: Starting job: reduce at
SparkPi.scala:35
15/01/19 02:41:44 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35)
with 3 output partitions (allowLocal=false)
15/01/19 02:41:44 INFO DAGScheduler: Final stage: Stage 0(reduce at
SparkPi.scala:35)
15/01/19 02:41:44 INFO DAGScheduler: Parents of final stage: List()
15/01/19 02:41:44 INFO DAGScheduler: Missing parents: List()
15/01/19 02:41:44 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at
map at SparkPi.scala:31), which has no missing parents
15/01/19 02:41:45 INFO MemoryStore: ensureFreeSpace(1728) called with
curMem=0, maxMem=280248975
15/01/19 02:41:45 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 1728.0 B, free 267.3 MB)
15/01/19 02:41:45 INFO MemoryStore: ensureFreeSpace(1235) called with
curMem=1728, maxMem=280248975
15/01/19 02:41:45 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 1235.0 B, free 267.3 MB)
15/01/19 02:41:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on master1:54462 (size: 1235.0 B, free: 267.3 MB)
15/01/19 02:41:45 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece0
15/01/19 02:41:45 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:838
15/01/19 02:41:45 INFO DAGScheduler: Submitting 3 missing tasks from Stage
0 (MappedRDD[1] at map at SparkPi.scala:31)
15/01/19 02:41:45 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 tasks
15/01/19 02:42:00 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory


and it keeps repeating "Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient
memory"

I have verified that http://192.0.3.11:8081/spark-1.2.0.tgz is accessible
from all the slave nodes.




*My Spark Environment variables list *

*Environment*

*Runtime Information*

*Name ▾*

*Value*

Java Home

/usr/lib/jvm/java-7-openjdk-amd64/jre

Java Version

1.7.0_65 (Oracle Corporation)

Scala Version

version 2.10.4

*Spark Properties*

*Name*

*Value*

spark.app.id

20150119-003609-201523392-5050-7198-0005

spark.app.name

Spark Pi

spark.driver.host

master1

spark.driver.port

46107

spark.executor.id

driver

spark.fileserver.uri

http://192.0.3.11:55424

spark.jars

file:/home/vagrant/spark-1.2.0/examples/target/scala-2.10/spark-examples-1.2.0-hadoop1.0.4.jar

spark.master

mesos://zk://192.0.3.11:2181,192.0.3.12:2181,192.0.3.13:2181/mesos

spark.scheduler.mode

FIFO

spark.tachyonStore.folderName

spark-3dffd4bb-f23b-43f7-a498-54b401dc591b

*System Properties*

*Name*

*Value*

SPARK_SUBMIT

true

awt.toolkit

sun.awt.X11.XToolkit

file.encoding

UTF-8

file.encoding.pkg

sun.io

file.separator

/

java.awt.graphicsenv

sun.awt.X11GraphicsEnvironment

java.awt.printerjob

sun.print.PSPrinterJob

java.class.version

51.0

java.endorsed.dirs

/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/endorsed

java.ext.dirs

/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext:/usr/java/packages/lib/ext

java.home

/usr/lib/jvm/java-7-openjdk-amd64/jre

java.io.tmpdir

/tmp

java.library.path

/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib

java.runtime.name

OpenJDK Runtime Environment

java.runtime.version

1.7.0_65-b32

java.specification.name

Java Platform API Specification

java.specification.vendor

Oracle Corporation

java.specification.version

1.7

java.vendor

Oracle Corporation

java.vendor.url

http://java.oracle.com/

java.vendor.url.bug

http://bugreport.sun.com/bugreport/

java.version

1.7.0_65

java.vm.info

mixed mode

java.vm.name

OpenJDK 64-Bit Server VM

java.vm.specification.name

Java Virtual Machine Specification

java.vm.specification.vendor

Oracle Corporation

java.vm.specification.version

1.7

java.vm.vendor

Oracle Corporation

java.vm.version

24.65-b04

line.separator


os.arch

amd64

os.name

Linux

os.version

3.13.0-43-generic

path.separator

:

sun.arch.data.model

64

sun.boot.class.path

/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rhino.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/classes

sun.boot.library.path

/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64

sun.cpu.endian

little

sun.cpu.isalist


sun.io.unicode.encoding

UnicodeLittle

sun.java.command

org.apache.spark.deploy.SparkSubmit --master mesos://zk://192.0.3.11:2181,
192.0.3.12:2181,192.0.3.13:2181/mesos --class
org.apache.spark.examples.SparkPi
/home/vagrant/spark-1.2.0/examples/target/scala-2.10/spark-examples-1.2.0-hadoop1.0.4.jar

sun.java.launcher

SUN_STANDARD

sun.jnu.encoding

UTF-8

sun.management.compiler

HotSpot 64-Bit Tiered Compilers

sun.nio.ch.bugLevel


sun.os.patch.level

unknown

user.country

US

user.dir

/home/vagrant/spark-1.2.0

user.home

/home/vagrant

user.language

en

user.name

vagrant

user.timezone

Etc/UTC

*Classpath Entries*

*Resource*

*Source*

/home/vagrant/spark-1.2.0/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop1.0.4.jar

System Classpath

/home/vagrant/spark-1.2.0/conf

System Classpath

http://192.0.3.11:55424/jars/spark-examples-1.2.0-hadoop1.0.4.jar

Added By User

Mime
View raw message