How many partitions are in your input data set? A possibility is that your input data has 10 unsplittable files, so you end up with 10 partitions. You could improve this by using RDD#repartition(). 

Note that mapPartitionsWithIndex is sort of the "main processing loop" for many Spark functions. It is iterating through all the elements of the partition and doing some computation (probably running your user code) on it. 

You can see the number of partitions in your RDD by visiting the Spark driver web interface. To access this, visit port 8080 on host running your Standalone Master (assuming you're running standalone mode), which will have a link to the application web interface. The Tachyon master also has a useful web interface, available at port 19999.

I know spark application should take all cores by default. My question is  how to set task number on each core ?
If one silce, one task,  how can i set silce file size ?

my aim of setting task number is to increase the query speed,    and I have also found " mapPartitionsWithIndex at Operator.scala:333"  is costing much time.  so, my another question is :
how to tunning mapPartitionsWithIndex  to make the costing time down?

i am using tachyon as storage system and using to shark to query a table which is a bigtable, i have 5 machines as a spark cluster, there are 4 cores on each machine .
My question is:
1. how to set task number on each core?
2. where to see how many partitions of one RDD?