spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Lust <jl...@mc10inc.com>
Subject cannot run spark-shell interactively against cluster from remote host - confusing memory warnings
Date Sat, 24 Jan 2015 21:10:08 GMT
I’ve setup a Spark cluster in the last few weeks and everything is working, but I cannot
run spark-shell interactively against the cluster from a remote host

  *   Deploy .jar to cluster from remote (laptop) spark-submit and have it run – Check
  *   Run .jar on spark-shell locally – Check
  *   Run same .jar on spark-shell on master server – Check
  *   Run spark-shell interactively against cluster on master server – Check
  *   Run spark-shell interactively from remote (laptop) against cluster – FAIL

It seems other people have faced this same issue:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-working-local-but-not-remote-td19727.html

I’m getting the same warnings about memory, despite plenty of memory being available for
the job to run (see above working cases)

"WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI
to ensure that workers are registered and have sufficient memory”

Some have suggested it has to do with conflicts of Jars on the class path and that Spark is
providing spurious memory error messages while the problem is really class path conflicts.
http://apache-spark-user-list.1001560.n3.nabble.com/WARN-ClusterScheduler-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-thay-td374.html#a396

Details:

  *   Cluster: 1 master, 3 workers on 4GB/4 core Ubuntu 14.04 LTS
  *   Local (aka remote laptop) MacBook Pro 10.10.1
  *   All running HotSpot Java (build 1.8.0_31-b13 and uild 1.8.0_25-b17)
  *   All running spark-1.2.0-bin-hadoop2.4
  *   Using Standalone cluster manager

Cluster UI:
*

Even when I clamp down to the most restrictive amounts, 1 core, 1 executor, 128m (of 3G available),
it still says I don’t have the resources:

>>>> Start Console example
$ spark-shell --executor-memory 128m --total-executor-cores 1 --driver-cores 1 --master spark://XXXX:7077

15/01/24 15:57:29 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala> val rdd = sc.parallelize(1 to 1000);
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> rdd.count

15/01/24 15:58:20 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/01/24 15:58:20 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
15/01/24 15:58:20 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (ParallelCollectionRDD[0]
at parallelize at <console>:12)
15/01/24 15:58:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/01/24 15:58:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient memory
>>> End console example

So, can anyone tell me if remote interactive spark-shell on a Standalone cluster even works?
Thanks for your help.

Cluster UI below showing job is running on cluster, is using a driver app and worker, and
that there are plenty of cores and GB of memory free.

[cid:533C8C93-7F29-4E0E-9FFE-647DCFEA6A6E]

Sincerely,
Joe Lust

Mime
View raw message