spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Why does this siimple spark program uses only one core?
Date Sun, 09 Nov 2014 23:43:28 GMT
Call getNumPartitions() on your RDD to make sure it has the right number of partitions. You
can also specify it when doing parallelize, e.g.

rdd = sc.parallelize(xrange(1000), 10))

This should run in parallel if you have multiple partitions and cores, but it might be that
during part of the process only one node (e.g. the master process) is doing anything.

Matei


> On Nov 9, 2014, at 9:27 AM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
> 
> You can set the following entry inside the conf/spark-defaults.conf file 
> 
> spark.cores.max 16
> 
> If you want to read the default value, then you can use the following api call
> 
> sc.defaultParallelism
> 
> where ​sc is your sparkContext object.​
> 
> Thanks
> Best Regards
> 
> On Sun, Nov 9, 2014 at 6:48 PM, ReticulatedPython <person.of.book@gmail.com <mailto:person.of.book@gmail.com>>
wrote:
> So, I'm running this simple program on a 16 core multicore system. I run it
> by issuing the following.
> 
> spark-submit --master local[*] pi.py
> 
> And the code of that program is the following. When I use top to see CPU
> consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
> documentation says that the default parallelism is contained in property
> spark.default.parallelism. How can I read this property from within my
> python program?
> 
> #"""pi.py"""
> from pyspark import SparkContext
> import random
> 
> NUM_SAMPLES = 12500000
> 
> def sample(p):
>     x, y = random.random(), random.random()
>     return 1 if x*x + y*y < 1 else 0
> 
> sc = SparkContext("local", "Test App")
> count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
> b: a + b)
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
<http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <mailto:user-help@spark.apache.org>
> 
> 


Mime
View raw message