spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naveen Kumar Pokala <npok...@spcapitaliq.com>
Subject RE: Parallelize on spark context
Date Fri, 07 Nov 2014 07:49:04 GMT
Hi,

In the documentation is I found something like this.

spark.default.parallelism

·         Local mode: number of cores on the local machine
·         Mesos fine grained mode: 8
·         Others: total number of cores on all executor nodes or 2, whichever is larger


I am using 2 node cluster with 48 cores(24+24). As per above no of data sets should be 1000/48=20.83,
can be around 20 or 21.

But it is dividing into 2 sets of each 500 size.

I have used the function sc.parallelize(data, 10). But 10 datasets of size 100. 8 datasets
executing on one node  and 2 datasets on another node.

How to check how many cores are running to complete task of 8 datasets?(Is there any commands
or UI to check that)

Regards,
Naveen.


From: holden.karau@gmail.com [mailto:holden.karau@gmail.com] On Behalf Of Holden Karau
Sent: Friday, November 07, 2014 12:46 PM
To: Naveen Kumar Pokala
Cc: user@spark.apache.org
Subject: Re: Parallelize on spark context

Hi Naveen,

So by default when we call parallelize it will be parallelized by the default number (which
we can control with the property spark.default.parallelism) or if we just want a specific
instance of parallelize to have a different number of partitions, we can instead call sc.parallelize(data,
numpartitions). The default value of this is documented in http://spark.apache.org/docs/latest/configuration.html#spark-properties

Cheers,

Holden :)

On Thu, Nov 6, 2014 at 10:43 PM, Naveen Kumar Pokala <npokala@spcapitaliq.com<mailto:npokala@spcapitaliq.com>>
wrote:
Hi,

JavaRDD<Integer> distData = sc.parallelize(data);

On what basis parallelize splits the data into multiple datasets. How to handle if we want
these many datasets to be executed per executor?

For example, my data is of 1000 integers list and I am having 2 node yarn cluster. It is diving
into 2 batches of 500 size.

Regards,
Naveen.



--
Cell : 425-233-8271
Mime
View raw message