spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tanase <>
Subject Re: Partition for each executor
Date Tue, 20 Oct 2015 20:12:28 GMT
I think it should use the default parallelism which by default is equal to the number of cores
in your cluster.

If you want to control it, specify a value for numSlices - the second param to parallelize().


On 10/20/15, 6:13 PM, "t3l" <> wrote:

>If I have a cluster with 7 nodes, each having an equal amount of cores and
>create an RDD with sc.parallelize() it looks as if the Spark will always
>tries to distribute the partitions.
>(1) Is that something I can rely on?
>(2) Can I rely that sc.parallelize() will assign partitions to as many
>executers as possible. Meaning: Let's say I request 7 partitions, is each
>node guaranteed to get 1 of these partitions? If I select 14 partitions, is
>each node guaranteed to grab 2 partitions?
>P.S.: I am aware that for other cases like sc.hadoopFile, this might depend
>in the actual storage location of the data. I am merely asking for the
>sc.parallelize() case.
>View this message in context:
>Sent from the Apache Spark User List mailing list archive at
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message