spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shangyu Luo <lsy...@gmail.com>
Subject Re: The functionality of daemon.py?
Date Tue, 08 Oct 2013 16:37:29 GMT
Hello Jey,
Thank you for answering. I have found that there are about 6 or 7
'daemon.py' processes in one worker node. Will each core have a 'daemon.py'
process? How to decide how many 'daemon.py' processes in one worker node? I
have also found that there are many spark related java process in a worker
node, so if the java process on worker node is just responsible for
communication, why spark needs so many java processes?
Overall, I think the main problem I have for my program is the memory
allocation. More specifically, in spark-env.sh, there are two options, *
SPARK_DAEMON_MEMORY* and *SPARK_DAEMON_JAVA_OPTS*. I can also set up *
spark.executor.memory* in SPARK_JAVA_OPTS. So if I have 68g memory in a
worker node, how should I distribute memory for these options? At present,
I use the default value for SPARK_DAEMON_MEMORY and SPARK_DAEMON_JAVA_OPTS
and set spark.executor.memory to 20g. It seems that spark will add rdd to
spark.executor.memory and I find that each 'daemon.py' will also consume
about 7g memory. Now when running my program for a while, the program will
use up all memory on a worker node and the master node will report
connection errors. (I have 5 worker nodes, each has 8 cores) So I am a
little confused about the jobs that the three options are responsible for
and how to distribute memories to them.
Any suggestion will be appreciated.
Thanks!

Best,
Shangyu


2013/10/8 Jey Kottalam <jey@cs.berkeley.edu>

> Hi Shangyu,
>
> The daemon.py python process is the actual PySpark worker process, and
> is launched by the Spark worker when running Python jobs. So, when
> using PySpark, the "real computation" is handled by a python process
> (via daemon.py), not a java process.
>
> Hope that helps,
> -Jey
>
> On Mon, Oct 7, 2013 at 9:50 PM, Shangyu Luo <lsyurd@gmail.com> wrote:
> > Hello!
> > I am using Spark 0.7.3 with python version.  Recently when I run some
> spark
> > program on a cluster, I found that some processes invoked by
> > spark-0.7.3/python/pyspark/daemon.py would capturing CPU for a long time
> and
> > consume much memory (e.g., 5g for each process). It seemed that the java
> > process, which was invoked by
> > java -cp
> >
> :/usr/lib/spark-0.7.3/conf:/usr/lib/spark-0.7.3/core/target/scala-2.9.3/classes
> > ...  , was 'competing' with the daemon.py for CPU resources. From my
> > understanding, the java process should be responsible for the 'real'
> > computation in spark.
> > So I am wondering what job the daemon.py will work on? Is it normal for
> it
> > to consume a lot of CPU and memory?
> > Thanks!
> >
> >
> > Best,
> > Shangyu Luo
> > --
> > --
> >
> > Shangyu, Luo
> > Department of Computer Science
> > Rice University
> >
>



-- 
--

Shangyu, Luo
Department of Computer Science
Rice University

--
Not Just Think About It, But Do It!
--
Success is never final.
--
Losers always whine about their best

Mime
View raw message