spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bmiller1 <bmill...@cs.berkeley.edu>
Subject pyspark crash on mesos
Date Mon, 03 Mar 2014 18:21:37 GMT
Hi All,

After switching from standalone Spark to Mesos I'm experiencing some
instability.  I'm running pyspark interactively through iPython notebook,
and get this crash non-deterministically (although pretty reliably in the
first 2000 tasks, often much sooner).

Exception in thread "DAGScheduler" org.apache.spark.SparkException: EOF
reached before Python server acknowledged
	at
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:340)
	at
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:311)
	at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:70)
	at
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:253)
	at
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:251)
	at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
	at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
	at scala.collection.Iterator$class.foreach(Iterator.scala:772)
	at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
	at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
	at org.apache.spark.Accumulators$.add(Accumulators.scala:251)
	at
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:662)
	at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:437)
	at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
	at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

I'm running the following software versions on all machines:
Spark: 0.8.1  (md5: 5d3c56eaf91c7349886d5c70439730b3)
Mesos: 0.13.0  (md5: 220dc9c1db118bc7599d45631da578b9)
Python 2.7.3 (Stackoverflow mentioned differing python versions may be to
blame --- unless Spark or iPython is specifically invoking an older version
under the hood mine are all the same).
Ubuntu 12.0.4

I've modified mesos-daemon.sh as follows:
I had problems launching the cluster with mesos-start-cluster.sh and traced
the problem to (what seemed to be) a bug in mesos-daemon.sh which used a
"--conf" flag that mesos-slave and mesos-master didn't recognize.  I removed
the flag and instead added code to read in environment variables from
mesos-deploy-env.sh.  mesos-start-cluster.sh then worked as advertised.

Incase it's helpful, I've inclucded several files as follows:
* spark_full_output
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/spark_full_output> 
: output of ipython process where SparkContext was created
* mesos-deploy-env.sh
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-deploy-env.sh>

: mesos config file from slave (identical to master except for MESOS_MASTER)
* spark-env.sh
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/spark-env.sh> 
: spark config file
* mesos-master.INFO
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-master.INFO> 
: log file from mesos-master
* mesos-master.WARNING
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-master.WARNING>

: log file from mesos-master
* mesos-daemon.sh
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n2255/mesos-daemon.sh> 
: my modified version of mesos-daemon.sh

Incase anybody from Berkeley is so interested they want to interact with my
deployment, my office is in Soda hall so that can definitely be arranged.

-Brad Miller



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-crash-on-mesos-tp2255.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message