spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jey Kottalam <...@cs.berkeley.edu>
Subject Re: The functionality of daemon.py?
Date Tue, 08 Oct 2013 15:00:01 GMT
Hi Shangyu,

The daemon.py python process is the actual PySpark worker process, and
is launched by the Spark worker when running Python jobs. So, when
using PySpark, the "real computation" is handled by a python process
(via daemon.py), not a java process.

Hope that helps,
-Jey

On Mon, Oct 7, 2013 at 9:50 PM, Shangyu Luo <lsyurd@gmail.com> wrote:
> Hello!
> I am using Spark 0.7.3 with python version.  Recently when I run some spark
> program on a cluster, I found that some processes invoked by
> spark-0.7.3/python/pyspark/daemon.py would capturing CPU for a long time and
> consume much memory (e.g., 5g for each process). It seemed that the java
> process, which was invoked by
> java -cp
> :/usr/lib/spark-0.7.3/conf:/usr/lib/spark-0.7.3/core/target/scala-2.9.3/classes
> ...  , was 'competing' with the daemon.py for CPU resources. From my
> understanding, the java process should be responsible for the 'real'
> computation in spark.
> So I am wondering what job the daemon.py will work on? Is it normal for it
> to consume a lot of CPU and memory?
> Thanks!
>
>
> Best,
> Shangyu Luo
> --
> --
>
> Shangyu, Luo
> Department of Computer Science
> Rice University
>

Mime
View raw message