spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chang Ya-Hsuan <sumti...@gmail.com>
Subject value of sc.defaultParallelism
Date Wed, 23 Dec 2015 16:09:38 GMT
python version: 2.7.9
os: ubuntu 14.04
spark: 1.5.2

I run a standalone spark on localhost, and use the following code to access
sc.defaultParallism

# a.py
import pyspark
sc = pyspark.SparkContext()
print(sc.defaultParallelism)

and use the following command to submit

$  spark-submit --master spark://yahsuan-vm:7077 a.py

it prints 2, however, my spark web page shows I got 4 cores


​
according to http://spark.apache.org/docs/latest/configuration.html

spark.default.parallelismFor distributed shuffle operations likereduceByKey
 and join, the largest number of partitions in a parent RDD. For operations
likeparallelize with no parent RDDs, it depends on the cluster manager:

   - Local mode: number of cores on the local machine
   - Mesos fine grained mode: 8
   - Others: total number of cores on all executor nodes or 2, whichever is
   larger

Default number of partitions in RDDs returned by transformations like join,
reduceByKey, andparallelize when not set by user.
It seems I should get 4 rather than 2.
Am I misunderstood the document?

-- 
-- 張雅軒

Mime
View raw message