spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shangyu Luo <lsy...@gmail.com>
Subject Re: The functionality of daemon.py?
Date Tue, 08 Oct 2013 16:43:35 GMT
Also, I found that the 'daemon.py' will continue running on one worker node
even after I terminated the spark job at master node. A little strange for
me.


2013/10/8 Shangyu Luo <lsyurd@gmail.com>

> Hello Jey,
> Thank you for answering. I have found that there are about 6 or 7
> 'daemon.py' processes in one worker node. Will each core have a 'daemon.py'
> process? How to decide how many 'daemon.py' processes in one worker node? I
> have also found that there are many spark related java process in a worker
> node, so if the java process on worker node is just responsible for
> communication, why spark needs so many java processes?
> Overall, I think the main problem I have for my program is the memory
> allocation. More specifically, in spark-env.sh, there are two options, *
> SPARK_DAEMON_MEMORY* and *SPARK_DAEMON_JAVA_OPTS*. I can also set up *
> spark.executor.memory* in SPARK_JAVA_OPTS. So if I have 68g memory in a
> worker node, how should I distribute memory for these options? At present,
> I use the default value for SPARK_DAEMON_MEMORY and SPARK_DAEMON_JAVA_OPTS
> and set spark.executor.memory to 20g. It seems that spark will add rdd to
> spark.executor.memory and I find that each 'daemon.py' will also consume
> about 7g memory. Now when running my program for a while, the program will
> use up all memory on a worker node and the master node will report
> connection errors. (I have 5 worker nodes, each has 8 cores) So I am a
> little confused about the jobs that the three options are responsible for
> and how to distribute memories to them.
> Any suggestion will be appreciated.
> Thanks!
>
> Best,
> Shangyu
>
>
> 2013/10/8 Jey Kottalam <jey@cs.berkeley.edu>
>
>> Hi Shangyu,
>>
>> The daemon.py python process is the actual PySpark worker process, and
>> is launched by the Spark worker when running Python jobs. So, when
>> using PySpark, the "real computation" is handled by a python process
>> (via daemon.py), not a java process.
>>
>> Hope that helps,
>> -Jey
>>
>> On Mon, Oct 7, 2013 at 9:50 PM, Shangyu Luo <lsyurd@gmail.com> wrote:
>> > Hello!
>> > I am using Spark 0.7.3 with python version.  Recently when I run some
>> spark
>> > program on a cluster, I found that some processes invoked by
>> > spark-0.7.3/python/pyspark/daemon.py would capturing CPU for a long
>> time and
>> > consume much memory (e.g., 5g for each process). It seemed that the java
>> > process, which was invoked by
>> > java -cp
>> >
>> :/usr/lib/spark-0.7.3/conf:/usr/lib/spark-0.7.3/core/target/scala-2.9.3/classes
>> > ...  , was 'competing' with the daemon.py for CPU resources. From my
>> > understanding, the java process should be responsible for the 'real'
>> > computation in spark.
>> > So I am wondering what job the daemon.py will work on? Is it normal for
>> it
>> > to consume a lot of CPU and memory?
>> > Thanks!
>> >
>> >
>> > Best,
>> > Shangyu Luo
>> > --
>> > --
>> >
>> > Shangyu, Luo
>> > Department of Computer Science
>> > Rice University
>> >
>>
>
>
>
> --
> --
>
> Shangyu, Luo
> Department of Computer Science
> Rice University
>
> --
> Not Just Think About It, But Do It!
> --
> Success is never final.
> --
> Losers always whine about their best
>



-- 
--

Shangyu, Luo
Department of Computer Science
Rice University

--
Not Just Think About It, But Do It!
--
Success is never final.
--
Losers always whine about their best

Mime
View raw message