Sorry, master spark URL in the web UI is spark://, exactly as configured.

On 6/27/14, 9:07 AM, Shannon Quinn wrote:
I put the settings as you specified in for the master. When I run, the web UI shows both the worker on the master (machine1) and the slave worker (machine2) as ALIVE and ready, with the master URL at spark:// However, when I run spark-submit, it immediately crashes with

py4j.protocol.Py4JJavaError14/06/27 09:01:32 ERROR Remoting: Remoting error: [Startup failed]
akka.remote.RemoteTransportException: Startup failed
[...] Failed to bind to /
[...] Address already in use.

This seems entirely contrary to intuition; why would Spark be unable to bind to the exact IP:port set for the master?

On 6/27/14, 1:54 AM, Akhil Das wrote:
Hi Shannon,

How about a setting like the following? (just removed the quotes)


Not sure whats happening in your case, it could be that your system is not able to bind to address. What is the spark:// master url that you are seeing there in the webUI? (It should be spark:// in your case).

Best Regards

On Fri, Jun 27, 2014 at 5:47 AM, Shannon Quinn <> wrote:
In the interest of completeness, this is how I invoke spark:

[on master]

> sbin/
> spark-submit --py-files


On Jun 26, 2014, at 17:29, Shannon Quinn <> wrote:

My *best guess* (please correct me if I'm wrong) is that the master (machine1) is sending the command to the worker (machine2) with the localhost argument as-is; that is, machine2 isn't doing any weird address conversion on its end.

Consequently, I've been focusing on the settings of the master/machine1. But I haven't found anything to indicate where the localhost argument could be coming from. /etc/hosts lists only as localhost; spark-defaults.conf list spark.master as the full IP address (not; on the master also lists the full IP under SPARK_MASTER_IP. The *only* place on the master where it's associated with localhost is SPARK_LOCAL_IP.

In looking at the logs of the worker spawned on master, it's also receiving a "spark://localhost:5060" argument, but since it resides on the master that works fine. Is it possible that the master is, for some reason, passing "spark://{SPARK_LOCAL_IP}:5060" to the workers?

That was my motivation behind commenting out SPARK_LOCAL_IP; however, that's when the master crashes immediately due to the address already being in use.

Any ideas? Thanks!


On 6/26/14, 10:14 AM, Akhil Das wrote:
Can you paste your file?

Best Regards

On Thu, Jun 26, 2014 at 7:01 PM, Shannon Quinn <> wrote:
Both /etc/hosts have each other's IP addresses in them. Telneting from machine2 to machine1 on port 5060 works just fine.

Here's the output of lsof:

java    23985 user   30u  IPv6 11092354      0t0  TCP machine1:sip (LISTEN)
java    23985 user   40u  IPv6 11099560      0t0  TCP machine1:sip->machine1:48315 (ESTABLISHED)
java    23985 user   52u  IPv6 11100405      0t0  TCP machine1:sip->machine2:54476 (ESTABLISHED)
java    24157 user   40u  IPv6 11092413      0t0  TCP machine1:48315->machine1:sip (ESTABLISHED)

Ubuntu seems to recognize 5060 as the standard port for "sip"; it's not actually running anything there besides Spark, it just does a s/5060/sip/g.

Is there something to the fact that every time I comment out SPARK_LOCAL_IP in spark-env, it crashes immediately upon spark-submit due to the "address already being in use"? Or am I barking up the wrong tree on that one?

Thanks again for all your help; I hope we can knock this one out.


On 6/26/14, 9:13 AM, Akhil Das wrote:
Do you have <ip>            machine1 in your workers /etc/hosts also? If so try telneting from your machine2 to machine1 on port 5060. Also make sure nothing else is running on port 5060 other than Spark (lsof -i:5060)

Best Regards

On Thu, Jun 26, 2014 at 6:35 PM, Shannon Quinn <> wrote:
Still running into the same problem. /etc/hosts on the master says    localhost
<ip>            machine1

<ip> is the same address set in for SPARK_MASTER_IP. Any other ideas?

On 6/26/14, 3:11 AM, Akhil Das wrote:
Hi Shannon,

It should be a configuration issue, check in your /etc/hosts and make sure localhost is not associated with the SPARK_MASTER_IP you provided.

Best Regards

On Thu, Jun 26, 2014 at 6:37 AM, Shannon Quinn <> wrote:
Hi all,

I have a 2-machine Spark network I've set up: a master and worker on machine1, and worker on machine2. When I run 'sbin/', everything starts up as it should. I see both workers listed on the UI page. The logs of both workers indicate successful registration with the Spark master.

The problems begin when I attempt to submit a job: I get an "address already in use" exception that crashes the program. It says "Failed to bind to " and lists the exact port and address of the master.

At this point, the only items I have set in my are SPARK_MASTER_IP and SPARK_MASTER_PORT (non-standard, set to 5060).

The next step I took, then, was to explicitly set SPARK_LOCAL_IP on the master to This allows the master to successfully send out the jobs; however, it ends up canceling the stage after running this command several times:

14/06/25 21:00:47 INFO AppClient$ClientActor: Executor added: app-20140625210032-0000/8 on worker-20140625205623-machine2-53597 (machine2:53597) with 8 cores
14/06/25 21:00:47 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140625210032-0000/8 on hostPort machine2:53597 with 8 cores, 8.0 GB RAM
14/06/25 21:00:47 INFO AppClient$ClientActor: Executor updated: app-20140625210032-0000/8 is now RUNNING
14/06/25 21:00:49 INFO AppClient$ClientActor: Executor updated: app-20140625210032-0000/8 is now FAILED (Command exited with code 1)

The "/8" started at "/1", eventually becomes "/9", and then "/10", at which point the program crashes. The worker on machine2 shows similar messages in its logs. Here are the last bunch:

14/06/25 21:00:31 INFO Worker: Executor app-20140625210032-0000/9 finished with state FAILED message Command exited with code 1 exitStatus 1
14/06/25 21:00:31 INFO Worker: Asked to launch executor app-20140625210032-0000/10 for app_name
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/06/25 21:00:32 INFO ExecutorRunner: Launch command: "java" "-cp" "::/home/spark/spark-1.0.0-bin-hadoop2/conf:/home/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar" "-XX:MaxPermSize=128m" "-Xms8192M" "-Xmx8192M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@localhost:5060/user/CoarseGrainedScheduler" "10" "machine2" "8" "akka.tcp://sparkWorker@machine2:53597/user/Worker" "app-20140625210032-0000"
14/06/25 21:00:33 INFO Worker: Executor app-20140625210032-0000/10 finished with state FAILED message Command exited with code 1 exitStatus 1

I highlighted the part that seemed strange to me; that's the master port number (I set it to 5060), and yet it's referencing localhost? Is this the reason why machine2 apparently can't seem to give a confirmation to the master once the job is submitted? (The logs from the worker on the master node indicate that it's running just fine)

I appreciate any assistance you can offer!

Shannon Quinn