spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shangyu Luo <>
Subject Re: The functionality of
Date Tue, 08 Oct 2013 16:37:29 GMT
Hello Jey,
Thank you for answering. I have found that there are about 6 or 7
'' processes in one worker node. Will each core have a ''
process? How to decide how many '' processes in one worker node? I
have also found that there are many spark related java process in a worker
node, so if the java process on worker node is just responsible for
communication, why spark needs so many java processes?
Overall, I think the main problem I have for my program is the memory
allocation. More specifically, in, there are two options, *
spark.executor.memory* in SPARK_JAVA_OPTS. So if I have 68g memory in a
worker node, how should I distribute memory for these options? At present,
I use the default value for SPARK_DAEMON_MEMORY and SPARK_DAEMON_JAVA_OPTS
and set spark.executor.memory to 20g. It seems that spark will add rdd to
spark.executor.memory and I find that each '' will also consume
about 7g memory. Now when running my program for a while, the program will
use up all memory on a worker node and the master node will report
connection errors. (I have 5 worker nodes, each has 8 cores) So I am a
little confused about the jobs that the three options are responsible for
and how to distribute memories to them.
Any suggestion will be appreciated.


2013/10/8 Jey Kottalam <>

> Hi Shangyu,
> The python process is the actual PySpark worker process, and
> is launched by the Spark worker when running Python jobs. So, when
> using PySpark, the "real computation" is handled by a python process
> (via, not a java process.
> Hope that helps,
> -Jey
> On Mon, Oct 7, 2013 at 9:50 PM, Shangyu Luo <> wrote:
> > Hello!
> > I am using Spark 0.7.3 with python version.  Recently when I run some
> spark
> > program on a cluster, I found that some processes invoked by
> > spark-0.7.3/python/pyspark/ would capturing CPU for a long time
> and
> > consume much memory (e.g., 5g for each process). It seemed that the java
> > process, which was invoked by
> > java -cp
> >
> :/usr/lib/spark-0.7.3/conf:/usr/lib/spark-0.7.3/core/target/scala-2.9.3/classes
> > ...  , was 'competing' with the for CPU resources. From my
> > understanding, the java process should be responsible for the 'real'
> > computation in spark.
> > So I am wondering what job the will work on? Is it normal for
> it
> > to consume a lot of CPU and memory?
> > Thanks!
> >
> >
> > Best,
> > Shangyu Luo
> > --
> > --
> >
> > Shangyu, Luo
> > Department of Computer Science
> > Rice University
> >


Shangyu, Luo
Department of Computer Science
Rice University

Not Just Think About It, But Do It!
Success is never final.
Losers always whine about their best

View raw message