spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugen Cepoi <cepoi.eu...@gmail.com>
Subject Re: Dead lock running multiple Spark Jobs on Mesos
Date Tue, 13 May 2014 09:06:36 GMT
I have a similar issue (but with spark 0.9.1) when a shell is active.
Multiple jobs run fine, but when the shell is active (even if at the moment
is not using any CPU) I encounter the exact same behaviour.

At the moment I don't know what happens and how to solve it, but I was
planning to have a look more in depth these days. I'll keep you tuned.

Eugen

PS: all the jobs run in fine-grained mode.


2014-05-12 21:29 GMT+02:00 Martin Weindel <martin.weindel@gmail.com>:

> I'm using a current Spark 1.0.0-SNAPSHOT for Hadoop 2.2.0 on Mesos 0.17.0.
>
> If I run a single Spark Job, the job runs fine on Mesos. Running multiple
> Spark Jobs also works, if I'm using the coarse-grained mode
> ("spark.mesos.coarse" = true).
>
> But if I run two Spark Jobs in parallel using the fine-grained mode, the
> jobs seem to block each other after a few seconds.
> And the Mesos UI reports no idle but also no used CPUs in this state.
>
> As soon as I kill one job, the other continues normally. See below for some
> log output.
> Looks to me as if something strange happens with assigning resources to the
> both jobs.
>
> Can anybody give me a hint about the cause? The jobs read some HDFS files,
> but have no other communication to external processes.
> Or any other suggestions how to analyse this problem?
>
> Thanks,
>
> Martin
>
> -----
> Here is the relevant log output of job1:
>
> INFO 17:53:09,247 Missing parents for Stage 2: List()
>  INFO 17:53:09,250 Submitting Stage 2 (MapPartitionsRDD[9] at mapPartitions
> at HighTemperatureSpansPerLogfile.java:92), which is now runnable
>  INFO 17:53:09,269 Submitting 1 missing tasks from Stage 2
> (MapPartitionsRDD[9] at mapPartitions at
> HighTemperatureSpansPerLogfile.java:92)
>  INFO 17:53:09,269 Adding task set 2.0 with 1 tasks
>
> ................................................................................
> *** at this point the job was killed ***
>
>
> log output of job2:
>  INFO 17:53:04,874 Missing parents for Stage 6: List()
>  INFO 17:53:04,875 Submitting Stage 6 (MappedRDD[23] at values at
> ComputeLogFileTimespan.java:71), which is now runnable
>  INFO 17:53:04,881 Submitting 1 missing tasks from Stage 6 (MappedRDD[23]
> at
> values at ComputeLogFileTimespan.java:71)
>  INFO 17:53:04,882 Adding task set 6.0 with 1 tasks
>
> ................................................................................
> *** at this point the job 1 was killed ***
> INFO 18:01:39,307 Starting task 6.0:0 as TID 7 on executor
> 20140501-141732-308511242-5050-2657-1: ustst019-cep-node2.usu.usu.grp
> (PROCESS_LOCAL)
>  INFO 18:01:39,307 Serialized task 6.0:0 as 3052 bytes in 0 ms
>  INFO 18:01:39,328 Asked to send map output locations for shuffle 2 to
> spark@ustst018-cep-node1.usu.usu.grp:40542
>  INFO 18:01:39,328 Size of output statuses for shuffle 2 is 178 bytes
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Dead-lock-running-multiple-Spark-Jobs-on-Mesos-tp5611.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message