From Naveen Kumar Pokala <>
Subject Number cores split up
Date Thu, 06 Nov 2014 07:27:54 GMT

I have a 2 node yarn cluster and I am using spark 1.1.0 to submit my tasks.

As per the documentation of spark, number of cores are maximum cores available. So does it
mean each node creates no of cores = no of threads to process the job assigned to that node.

For ex,

        List<Integer> data = new ArrayList<Integer>();
              for(int i=0;i<1000;i++)

   JavaRDD<Integer> distData = sc.parallelize(data);
                           new Function<Integer, Integer>() {

                                  public Integer call(Integer arg0) throws Exception {
                                         return arg0*arg0;



The above program dividing my RDD into 2 batches of 500 size, and submitting to the executors.

1)  So each executor will use all the cores of the node CPU to process 500 size batch am I

2)  If so, Does it mean each executor uses multi threading? Is that execution parallel or
sequential on node.

3)  How to check how many cores an executor is using to process my jobs?

4)      Do we have any chance to control the batch division on nodes?

Please  give some clarity on above.

Thanks & Regards,

