spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Horia <ho...@alum.berkeley.edu>
Subject Re: Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0
Date Tue, 03 Dec 2013 22:46:19 GMT
That's a strange behavior. It has no problem connecting to the HDFS
NameNode (v2.2.0) and reading and writing files, but this only works in
spark shell (and in pyspark shell).
The Spark workers not connecting to the Spark master shouldn't have
anything to do with the version of Hadoop against which Spark is
compiled... or am I completely missing something?



On Mon, Dec 2, 2013 at 4:11 AM, Maxime Lemaire <maxime.lemaire@wattgo.com>wrote:

> Horia,
> if you dont need yarn support you can get it work by setting SPARK_YARN to
> false :
> *SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=false sbt/sbt assembly*
>
> Raymond,
> Ok, thank you, so thats why, im using the lastest release 0.8.0 (september
> 25, 2013)
>
>
>
>
> 2013/12/2 Liu, Raymond <raymond.liu@intel.com>
>
> What version of code you are using?
>>
>> 2.2.0 support not yet merged into trunk. Check out
>> https://github.com/apache/incubator-spark/pull/199
>>
>> Best Regards,
>> Raymond Liu
>>
>> From: horia.fsf@gmail.com [mailto:horia.fsf@gmail.com] On Behalf Of Horia
>> Sent: Monday, December 02, 2013 3:00 PM
>> To: user@spark.incubator.apache.org
>> Subject: Re: Worker failed to connect when build with
>> SPARK_HADOOP_VERSION=2.2.0
>>
>> Has this been resolved?
>>
>> Forgive me if I missed the follow-up but I've been having the exact same
>> problem.
>>
>> - Horia
>>
>>
>> On Fri, Nov 22, 2013 at 5:38 AM, Maxime Lemaire <digital.mxl@gmail.com>
>> wrote:
>> Hi all,
>> When im building Spark with Hadoop 2.2.0 support, workers cant connect to
>> Spark master anymore.
>> Network is up and hostnames are correct. Tcpdump can clearly see workers
>> trying to connect (tcpdump outputs at the end).
>>
>> Same set up with Spark build without SPARK_HADOOP_VERSION (or
>> with SPARK_HADOOP_VERSION=2.0.5-alpha) is working fine !
>>
>> Some details :
>>
>> pmtx-master01 : master
>> pmtx-master02 : slave
>>
>> (behavior is the same if i launch both master and slave from the same box)
>>
>> Building HADOOP 2.2.0 support :
>>
>> fresh install on pmtx-master01 :
>> # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
>> ....build successfull
>> #
>>
>> fresh install on pmtx-master02 :
>> # SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
>> ...build successfull
>> #
>>
>> On pmtx-master01 :
>> # ./bin/start-master.sh
>> starting org.apache.spark.deploy.master.Master, logging to
>> /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out
>> # netstat -an | grep 7077
>> tcp6       0      0 10.90.XX.XX:7077        :::*
>>  LISTEN
>> #
>>
>> On pmtx-master02 :
>> # nc -v pmtx-master01 7077
>> pmtx-master01 [10.90.XX.XX] 7077 (?) open
>> # ./spark-class org.apache.spark.deploy.worker.Worker
>> spark://pmtx-master01:7077
>> 13/11/22 10:57:50 INFO Slf4jEventHandler: Slf4jEventHandler started
>> 13/11/22 10:57:50 INFO Worker: Starting Spark worker pmtx-master02:42271
>> with 8 cores, 22.6 GB RAM
>> 13/11/22 10:57:50 INFO Worker: Spark home: /cluster/bin/spark
>> 13/11/22 10:57:50 INFO WorkerWebUI: Started Worker web UI at
>> http://pmtx-master02:8081
>> 13/11/22 10:57:50 INFO Worker: Connecting to master
>> spark://pmtx-master01:7077
>> 13/11/22 10:57:50 ERROR Worker: Connection to master failed! Shutting
>> down.
>> #
>>
>> With spark-shell on pmtx-master02 :
>> # MASTER=spark://pmtx-master01:7077 ./spark-shell
>> Welcome to
>>   ____              __
>>  / __/__  ___ _____/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>  /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
>>   /_/
>>
>> Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java
>> 1.6.0_31)
>> Initializing interpreter...
>> Creating SparkContext...
>> 13/11/22 11:19:29 INFO Slf4jEventHandler: Slf4jEventHandler started
>> 13/11/22 11:19:29 INFO SparkEnv: Registering BlockManagerMaster
>> 13/11/22 11:19:29 INFO MemoryStore: MemoryStore started with capacity
>> 323.9 MB.
>> 13/11/22 11:19:29 INFO DiskStore: Created local directory at
>> /tmp/spark-local-20131122111929-3e3c
>> 13/11/22 11:19:29 INFO ConnectionManager: Bound socket to port 42249 with
>> id = ConnectionManagerId(pmtx-master02,42249)
>> 13/11/22 11:19:29 INFO BlockManagerMaster: Trying to register BlockManager
>> 13/11/22 11:19:29 INFO BlockManagerMaster: Registered BlockManager
>> 13/11/22 11:19:29 INFO HttpBroadcast: Broadcast server started at
>> http://10.90.66.67:52531
>> 13/11/22 11:19:29 INFO SparkEnv: Registering MapOutputTracker
>> 13/11/22 11:19:29 INFO HttpFileServer: HTTP File server directory is
>> /tmp/spark-40525f81-f883-45d5-92ad-bbff44ecf435
>> 13/11/22 11:19:29 INFO SparkUI: Started Spark Web UI at
>> http://pmtx-master02:4040
>> 13/11/22 11:19:29 INFO Client$ClientActor: Connecting to master
>> spark://pmtx-master01:7077
>> 13/11/22 11:19:30 ERROR Client$ClientActor: Connection to master failed;
>> stopping client
>> 13/11/22 11:19:30 ERROR SparkDeploySchedulerBackend: Disconnected from
>> Spark cluster!
>> 13/11/22 11:19:30 ERROR ClusterScheduler: Exiting due to error from
>> cluster scheduler: Disconnected from Spark cluster
>>
>> ---- snip ----
>>
>> WORKING : Building HADOOP 2.0.5-alpha support
>>
>> On pmtx-master01, now im building hadoop 2.0.5-alpha :
>> # sbt/sbt clean
>> ...
>> # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
>> ...
>> # ./bin/start-master.sh
>> starting org.apache.spark.deploy.master.Master, logging to
>> /cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out
>>
>> Same build on pmtx-master02 :
>> # sbt/sbt clean
>> ... build successfull ...
>> # SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
>> ... build successfull ...
>> # ./spark-class org.apache.spark.deploy.worker.Worker
>> spark://pmtx-master01:7077
>> 13/11/22 11:25:34 INFO Slf4jEventHandler: Slf4jEventHandler started
>> 13/11/22 11:25:34 INFO Worker: Starting Spark worker pmtx-master02:33768
>> with 8 cores, 22.6 GB RAM
>> 13/11/22 11:25:34 INFO Worker: Spark home: /cluster/bin/spark
>> 13/11/22 11:25:34 INFO WorkerWebUI: Started Worker web UI at
>> http://pmtx-master02:8081
>> 13/11/22 11:25:34 INFO Worker: Connecting to master
>> spark://pmtx-master01:7077
>> 13/11/22 11:25:34 INFO Worker: Successfully registered with master
>> #
>>
>> With spark-shell on pmtx-master02 :
>> # MASTER=spark://pmtx-master01:7077 ./spark-shell
>> Welcome to
>>   ____              __
>>  / __/__  ___ _____/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>  /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
>>   /_/
>>
>> Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java
>> 1.6.0_31)
>> Initializing interpreter...
>> Creating SparkContext...
>> 13/11/22 11:23:12 INFO Slf4jEventHandler: Slf4jEventHandler started
>> 13/11/22 11:23:12 INFO SparkEnv: Registering BlockManagerMaster
>> 13/11/22 11:23:12 INFO MemoryStore: MemoryStore started with capacity
>> 323.9 MB.
>> 13/11/22 11:23:12 INFO DiskStore: Created local directory at
>> /tmp/spark-local-20131122112312-3d8b
>> 13/11/22 11:23:12 INFO ConnectionManager: Bound socket to port 58826 with
>> id = ConnectionManagerId(pmtx-master02,58826)
>> 13/11/22 11:23:12 INFO BlockManagerMaster: Trying to register BlockManager
>> 13/11/22 11:23:12 INFO BlockManagerMaster: Registered BlockManager
>> 13/11/22 11:23:12 INFO HttpBroadcast: Broadcast server started at
>> http://10.90.66.67:39067
>> 13/11/22 11:23:12 INFO SparkEnv: Registering MapOutputTracker
>> 13/11/22 11:23:12 INFO HttpFileServer: HTTP File server directory is
>> /tmp/spark-ded7bcc1-bacf-4158-b20f-5b2fa6936e8b
>> 13/11/22 11:23:12 INFO SparkUI: Started Spark Web UI at
>> http://pmtx-master02:4040
>> 13/11/22 11:23:12 INFO Client$ClientActor: Connecting to master
>> spark://pmtx-master01:7077
>> Spark context available as sc.
>> 13/11/22 11:23:12 INFO SparkDeploySchedulerBackend: Connected to Spark
>> cluster with app ID app-20131122112312-0000
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>> scala>
>> #
>>
>> please be aware that I really dont know the Spark communication protocol
>> so forgive me if i am misunderstanding something. i will make assumptions
>> on whats happening.
>> As you can see in tcpdump output, when connection failed, the slave is
>> sending empty data packets (tcp header only without P flag and length 0)
>> when it should start the communication by saying "hello iam sparkWorker
>> pmtx-master02" (4th packet, line 19)
>>
>> Tcpdump output :
>> Connection Failed (hadoop 2.2.0) : http://pastebin.com/6N8tEgUf
>> Connection sucessfull (hadoop 2.0.5-alpha) : http://pastebin.com/CegYAjMj
>>
>> Also Im not familiar with log4j so if you have some tips to get more log
>> informations i will try them (im using default properties in
>> log4j.properties)
>>
>> Hadoop 2.2.0 is great, Spark 0.8 is awesome, so please, help me make them
>> work together ! :-)
>>
>> Thanks
>>
>> maxx
>>
>>
>
>
> --
> *Maxime Lemaire*
> *Directeur Informatique*
> *WattGo*
>
>  +33 6 76 07 40 60
>  maxime.lemaire@wattgo.com
>
> www.wattgo.com
>
>

Mime
View raw message