spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: repartitioning RDDS
Date Thu, 31 Oct 2013 15:49:50 GMT
Stephen is exactly correct, I just wanted to point out that in Spark 0.8.1
and above, the "repartition" function has been added to be a clearer way to
accomplish what you want. ("Coalescing" into a larger number of partitions
doesn't make much linguistic sense.)


On Thu, Oct 31, 2013 at 7:48 AM, Stephen Haberman <
stephen.haberman@gmail.com> wrote:

>
> > Is it possible to repartition RDDs other than by the coalesce method.
> > I am primarily interested in making finer grained partitioning or
> > rebalancing an unbalanced parttioning, without coalescing.
>
> I believe if you use the shuffle=true parameter, coalesce will do what
> you want, and essentially becomes a general "repartition" method.
>
> Specifically, yes, while shuffle=false can only make larger partitions,
> but with shuffle=true, you can break your partitions up into many
> smaller partitions, with the content based on a hash partitioner.
>
> I believe that's what you're asking for?
>
> - Stephen
>
>
>

Mime
View raw message