spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Quinn <squ...@gatech.edu>
Subject Re: Spark standalone network configuration problems
Date Thu, 26 Jun 2014 13:31:36 GMT
Both /etc/hosts have each other's IP addresses in them. Telneting from 
machine2 to machine1 on port 5060 works just fine.

Here's the output of lsof:

user@machine1:~/spark/spark-1.0.0-bin-hadoop2$ lsof -i:5060
COMMAND   PID   USER   FD TYPE   DEVICE SIZE/OFF NODE NAME
java    23985 user   30u  IPv6 11092354      0t0  TCP machine1:sip (LISTEN)
java    23985 user   40u  IPv6 11099560      0t0  TCP 
machine1:sip->machine1:48315 (ESTABLISHED)
java    23985 user   52u  IPv6 11100405      0t0  TCP 
machine1:sip->machine2:54476 (ESTABLISHED)
java    24157 user   40u  IPv6 11092413      0t0  TCP 
machine1:48315->machine1:sip (ESTABLISHED)

Ubuntu seems to recognize 5060 as the standard port for "sip"; it's not 
actually running anything there besides Spark, it just does a s/5060/sip/g.

Is there something to the fact that every time I comment out 
SPARK_LOCAL_IP in spark-env, it crashes immediately upon spark-submit 
due to the "address already being in use"? Or am I barking up the wrong 
tree on that one?

Thanks again for all your help; I hope we can knock this one out.

Shannon

On 6/26/14, 9:13 AM, Akhil Das wrote:
> Do you have <ip>         machine1 in your workers /etc/hosts also? If 
> so try telneting from your machine2 to machine1 on port 5060. Also 
> make sure nothing else is running on port 5060 other than Spark 
> (*/lsof -i:5060/*)
>
> Thanks
> Best Regards
>
>
> On Thu, Jun 26, 2014 at 6:35 PM, Shannon Quinn <squinn@gatech.edu 
> <mailto:squinn@gatech.edu>> wrote:
>
>     Still running into the same problem. /etc/hosts on the master says
>
>     127.0.0.1    localhost
>     <ip>            machine1
>
>     <ip> is the same address set in spark-env.sh for SPARK_MASTER_IP.
>     Any other ideas?
>
>
>     On 6/26/14, 3:11 AM, Akhil Das wrote:
>>     Hi Shannon,
>>
>>     It should be a configuration issue, check in your /etc/hosts and
>>     make sure localhost is not associated with the SPARK_MASTER_IP
>>     you provided.
>>
>>     Thanks
>>     Best Regards
>>
>>
>>     On Thu, Jun 26, 2014 at 6:37 AM, Shannon Quinn <squinn@gatech.edu
>>     <mailto:squinn@gatech.edu>> wrote:
>>
>>         Hi all,
>>
>>         I have a 2-machine Spark network I've set up: a master and
>>         worker on machine1, and worker on machine2. When I run
>>         'sbin/start-all.sh', everything starts up as it should. I see
>>         both workers listed on the UI page. The logs of both workers
>>         indicate successful registration with the Spark master.
>>
>>         The problems begin when I attempt to submit a job: I get an
>>         "address already in use" exception that crashes the program.
>>         It says "Failed to bind to " and lists the exact port and
>>         address of the master.
>>
>>         At this point, the only items I have set in my spark-env.sh
>>         are SPARK_MASTER_IP and SPARK_MASTER_PORT (non-standard, set
>>         to 5060).
>>
>>         The next step I took, then, was to explicitly set
>>         SPARK_LOCAL_IP on the master to 127.0.0.1. This allows the
>>         master to successfully send out the jobs; however, it ends up
>>         canceling the stage after running this command several times:
>>
>>         14/06/25 21:00:47 INFO AppClient$ClientActor: Executor added:
>>         app-20140625210032-0000/8 on
>>         worker-20140625205623-machine2-53597 (machine2:53597) with 8
>>         cores
>>         14/06/25 21:00:47 INFO SparkDeploySchedulerBackend: Granted
>>         executor ID app-20140625210032-0000/8 on hostPort
>>         machine2:53597 with 8 cores, 8.0 GB RAM
>>         14/06/25 21:00:47 INFO AppClient$ClientActor: Executor
>>         updated: app-20140625210032-0000/8 is now RUNNING
>>         14/06/25 21:00:49 INFO AppClient$ClientActor: Executor
>>         updated: app-20140625210032-0000/8 is now FAILED (Command
>>         exited with code 1)
>>
>>         The "/8" started at "/1", eventually becomes "/9", and then
>>         "/10", at which point the program crashes. The worker on
>>         machine2 shows similar messages in its logs. Here are the
>>         last bunch:
>>
>>         14/06/25 21:00:31 INFO Worker: Executor
>>         app-20140625210032-0000/9 finished with state FAILED message
>>         Command exited with code 1 exitStatus 1
>>         14/06/25 21:00:31 INFO Worker: Asked to launch executor
>>         app-20140625210032-0000/10 for app_name
>>         Spark assembly has been built with Hive, including
>>         Datanucleus jars on classpath
>>         14/06/25 21:00:32 INFO ExecutorRunner: Launch command: "java"
>>         "-cp"
>>         "::/home/spark/spark-1.0.0-bin-hadoop2/conf:/home/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar"
>>         "-XX:MaxPermSize=128m" "-Xms8192M" "-Xmx8192M"
>>         "org.apache.spark.executor.CoarseGrainedExecutorBackend"
>>         "*akka.tcp://spark@localhost:5060/user/CoarseGrainedScheduler*"
>>         "10" "machine2" "8"
>>         "akka.tcp://sparkWorker@machine2:53597/user/Worker"
>>         "app-20140625210032-0000"
>>         14/06/25 21:00:33 INFO Worker: Executor
>>         app-20140625210032-0000/10 finished with state FAILED message
>>         Command exited with code 1 exitStatus 1
>>
>>         I highlighted the part that seemed strange to me; that's the
>>         master port number (I set it to 5060), and yet it's
>>         referencing localhost? Is this the reason why machine2
>>         apparently can't seem to give a confirmation to the master
>>         once the job is submitted? (The logs from the worker on the
>>         master node indicate that it's running just fine)
>>
>>         I appreciate any assistance you can offer!
>>
>>         Regards,
>>         Shannon Quinn
>>
>>
>
>


Mime
View raw message