spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Horia <ho...@alum.berkeley.edu>
Subject Re: Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0
Date Mon, 02 Dec 2013 06:59:54 GMT
Has this been resolved?

Forgive me if I missed the follow-up but I've been having the exact same
problem.

- Horia



On Fri, Nov 22, 2013 at 5:38 AM, Maxime Lemaire <digital.mxl@gmail.com>wrote:

> Hi all,
> When im building Spark with Hadoop 2.2.0 support, workers cant connect to
> Spark master anymore.
> Network is up and hostnames are correct. Tcpdump can clearly see workers
> trying to connect (tcpdump outputs at the end).
>
> Same set up with Spark build without SPARK_HADOOP_VERSION (or with SPARK_HADOOP_VERSION=2.0.5-alpha)
> is working fine !
>
> Some details :
>
> pmtx-master01 : master
> pmtx-master02 : slave
>
> (behavior is the same if i launch both master and slave from the same box)
>
> Building HADOOP 2.2.0 support :
>
> fresh install on pmtx-master01 :
> # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
> ....build successfull
> #
>
> fresh install on pmtx-master02 :
> # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
> ...build successfull
> #
>
> On pmtx-master01 :
> # ./bin/start-master.sh
> starting org.apache.spark.deploy.master.Master, logging to
> /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out
> # netstat -an | grep 7077
> tcp6       0      0 10.90.XX.XX:7077        :::*                    LISTEN
> #
>
> On pmtx-master02 :
> # nc -v pmtx-master01 7077
> pmtx-master01 [10.90.XX.XX] 7077 (?) open
> # ./spark-class org.apache.spark.deploy.worker.Worker
> spark://pmtx-master01:7077
> 13/11/22 10:57:50 INFO Slf4jEventHandler: Slf4jEventHandler started
> 13/11/22 10:57:50 INFO Worker: Starting Spark worker pmtx-master02:42271
> with 8 cores, 22.6 GB RAM
> 13/11/22 10:57:50 INFO Worker: Spark home: /cluster/bin/spark
> 13/11/22 10:57:50 INFO WorkerWebUI: Started Worker web UI at
> http://pmtx-master02:8081
> 13/11/22 10:57:50 INFO Worker: Connecting to master
> spark://pmtx-master01:7077
> 13/11/22 10:57:50 ERROR Worker: Connection to master failed! Shutting down.
> #
>
> With spark-shell on pmtx-master02 :
> # MASTER=spark://pmtx-master01:7077 ./spark-shell
> Welcome to
>   ____              __
>  / __/__  ___ _____/ /__
>  _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
>   /_/
>
> Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.6.0_31)
> Initializing interpreter...
> Creating SparkContext...
> 13/11/22 11:19:29 INFO Slf4jEventHandler: Slf4jEventHandler started
> 13/11/22 11:19:29 INFO SparkEnv: Registering BlockManagerMaster
> 13/11/22 11:19:29 INFO MemoryStore: MemoryStore started with capacity
> 323.9 MB.
> 13/11/22 11:19:29 INFO DiskStore: Created local directory at
> /tmp/spark-local-20131122111929-3e3c
> 13/11/22 11:19:29 INFO ConnectionManager: Bound socket to port 42249 with
> id = ConnectionManagerId(pmtx-master02,42249)
> 13/11/22 11:19:29 INFO BlockManagerMaster: Trying to register BlockManager
> 13/11/22 11:19:29 INFO BlockManagerMaster: Registered BlockManager
> 13/11/22 11:19:29 INFO HttpBroadcast: Broadcast server started at
> http://10.90.66.67:52531
> 13/11/22 11:19:29 INFO SparkEnv: Registering MapOutputTracker
> 13/11/22 11:19:29 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-40525f81-f883-45d5-92ad-bbff44ecf435
> 13/11/22 11:19:29 INFO SparkUI: Started Spark Web UI at
> http://pmtx-master02:4040
> 13/11/22 11:19:29 INFO Client$ClientActor: Connecting to master
> spark://pmtx-master01:7077
> 13/11/22 11:19:30 ERROR Client$ClientActor: Connection to master failed;
> stopping client
> 13/11/22 11:19:30 ERROR SparkDeploySchedulerBackend: Disconnected from
> Spark cluster!
> 13/11/22 11:19:30 ERROR ClusterScheduler: Exiting due to error from
> cluster scheduler: Disconnected from Spark cluster
>
> ---- snip ----
>
> WORKING : Building HADOOP 2.0.5-alpha support
>
> On pmtx-master01, now im building hadoop 2.0.5-alpha :
> # sbt/sbt clean
> ...
> # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
> ...
> # ./bin/start-master.sh
> starting org.apache.spark.deploy.master.Master, logging to
> /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out
>
> Same build on pmtx-master02 :
> # sbt/sbt clean
> ... build successfull ...
> # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
> ... build successfull ...
> # ./spark-class org.apache.spark.deploy.worker.Worker
> spark://pmtx-master01:7077
> 13/11/22 11:25:34 INFO Slf4jEventHandler: Slf4jEventHandler started
> 13/11/22 11:25:34 INFO Worker: Starting Spark worker pmtx-master02:33768
> with 8 cores, 22.6 GB RAM
> 13/11/22 11:25:34 INFO Worker: Spark home: /cluster/bin/spark
> 13/11/22 11:25:34 INFO WorkerWebUI: Started Worker web UI at
> http://pmtx-master02:8081
> 13/11/22 11:25:34 INFO Worker: Connecting to master
> spark://pmtx-master01:7077
> 13/11/22 11:25:34 INFO Worker: Successfully registered with master
> #
>
> With spark-shell on pmtx-master02 :
> # MASTER=spark://pmtx-master01:7077 ./spark-shell
> Welcome to
>    ____              __
>  / __/__  ___ _____/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
>   /_/
>
> Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.6.0_31)
> Initializing interpreter...
> Creating SparkContext...
> 13/11/22 11:23:12 INFO Slf4jEventHandler: Slf4jEventHandler started
> 13/11/22 11:23:12 INFO SparkEnv: Registering BlockManagerMaster
> 13/11/22 11:23:12 INFO MemoryStore: MemoryStore started with capacity
> 323.9 MB.
> 13/11/22 11:23:12 INFO DiskStore: Created local directory at
> /tmp/spark-local-20131122112312-3d8b
> 13/11/22 11:23:12 INFO ConnectionManager: Bound socket to port 58826 with
> id = ConnectionManagerId(pmtx-master02,58826)
> 13/11/22 11:23:12 INFO BlockManagerMaster: Trying to register BlockManager
> 13/11/22 11:23:12 INFO BlockManagerMaster: Registered BlockManager
> 13/11/22 11:23:12 INFO HttpBroadcast: Broadcast server started at
> http://10.90.66.67:39067
> 13/11/22 11:23:12 INFO SparkEnv: Registering MapOutputTracker
> 13/11/22 11:23:12 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-ded7bcc1-bacf-4158-b20f-5b2fa6936e8b
> 13/11/22 11:23:12 INFO SparkUI: Started Spark Web UI at
> http://pmtx-master02:4040
> 13/11/22 11:23:12 INFO Client$ClientActor: Connecting to master
> spark://pmtx-master01:7077
> Spark context available as sc.
> 13/11/22 11:23:12 INFO SparkDeploySchedulerBackend: Connected to Spark
> cluster with app ID app-20131122112312-0000
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala>
> #
>
> please be aware that I really dont know the Spark communication protocol
> so forgive me if i am misunderstanding something. i will make assumptions
> on whats happening.
> As you can see in tcpdump output, when connection failed, the slave is
> sending empty data packets (tcp header only without P flag and length 0)
> when it should start the communication by saying "hello iam sparkWorker
> pmtx-master02" (4th packet, line 19)
>
> Tcpdump output :
> Connection Failed (hadoop 2.2.0) : http://pastebin.com/6N8tEgUf
> Connection sucessfull (hadoop 2.0.5-alpha) : http://pastebin.com/CegYAjMj
>
> Also Im not familiar with log4j so if you have some tips to get more log
> informations i will try them (im using default properties in
> log4j.properties)
>
> Hadoop 2.2.0 is great, Spark 0.8 is awesome, so please, help me make them
> work together ! :-)
>
> Thanks
>
> maxx
>

Mime
View raw message