spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ye Xianjin <advance...@gmail.com>
Subject Re: defaultMinPartitions in textFile
Date Tue, 22 Jul 2014 03:37:20 GMT
well, I think you miss this line of code in SparkContext.scala
line 1242-1243(master):
 /** Default min number of partitions for Hadoop RDDs when not given by user */
  def defaultMinPartitions: Int = math.min(defaultParallelism, 2)

so the defaultMinPartitions will be 2 unless the defaultParallelism is less than 2...


--  
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, July 22, 2014 at 10:18 AM, Wang, Jensen wrote:

> Hi,  
>         I started to use spark on yarn recently and found a problem while tuning my program.
>   
> When SparkContext is initialized as sc and ready to read text file from hdfs, the textFile(path,
defaultMinPartitions) method is called.
> I traced down the second parameter in the spark source code and finally found this:
>    conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))  in  CoarseGrainedSchedulerBackend.scala
>   
> I do not specify the property “spark.default.parallelism” anywhere so the getInt
will return value from the larger one between totalCoreCount and 2.
>   
> When I submit the application using spark-submit and specify the parameter: --num-executors
 2   --executor-cores 6, I suppose the totalCoreCount will be  
> 2*6 = 12, so defaultMinPartitions will be 12.
>   
> But when I print the value of defaultMinPartitions in my program, I still get 2 in return,
 How does this happen, or where do I make a mistake?
>  
>  
>  



Mime
View raw message