spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Is the RDD's Partitions determined before hand ?
Date Wed, 04 Mar 2015 10:17:30 GMT
Hi Sean,

 > If you know a stage needs unusually high parallelism for example you can
repartition further for that stage.

The problem is we may don't know whether high parallelism is needed. e.g.
for the join operator, high parallelism may only be necessary for some
dataset that lots of data can join together while for other dataset high
parallelism may not be necessary if only a few data can join together.

So my question is that unable changing parallelism at runtime dynamically
may not be flexible.



On Wed, Mar 4, 2015 at 5:36 PM, Sean Owen <sowen@cloudera.com> wrote:

> Hm, what do you mean? You can control, to some extent, the number of
> partitions when you read the data, and can repartition if needed.
>
> You can set the default parallelism too so that it takes effect for most
> ops thay create an RDD. One # of partitions is usually about right for all
> work (2x or so the number of execution slots).
>
> If you know a stage needs unusually high parallelism for example you can
> repartition further for that stage.
>  On Mar 4, 2015 1:50 AM, "Jeff Zhang" <zjffdu@gmail.com> wrote:
>
>> Thanks Sean.
>>
>> But if the partitions of RDD is determined before hand, it would not be
>> flexible to run the same program on the different dataset. Although for the
>> first stage the partitions can be determined by the input data set, for the
>> intermediate stage it is not possible. Users have to create policy to
>> repartition or coalesce based on the data set size.
>>
>>
>> On Tue, Mar 3, 2015 at 6:29 PM, Sean Owen <sowen@cloudera.com> wrote:
>>
>>> An RDD has a certain fixed number of partitions, yes. You can't change
>>> an RDD. You can repartition() or coalese() and RDD to make a new one
>>> with a different number of RDDs, possibly requiring a shuffle.
>>>
>>> On Tue, Mar 3, 2015 at 10:21 AM, Jeff Zhang <zjffdu@gmail.com> wrote:
>>> > I mean is it possible to change the partition number at runtime. Thanks
>>> >
>>> >
>>> > --
>>> > Best Regards
>>> >
>>> > Jeff Zhang
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>


-- 
Best Regards

Jeff Zhang

Mime
View raw message