spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Haberman <stephen.haber...@gmail.com>
Subject Re: repartitioning RDDS
Date Thu, 31 Oct 2013 14:48:32 GMT

> Is it possible to repartition RDDs other than by the coalesce method.
> I am primarily interested in making finer grained partitioning or
> rebalancing an unbalanced parttioning, without coalescing.

I believe if you use the shuffle=true parameter, coalesce will do what
you want, and essentially becomes a general "repartition" method.

Specifically, yes, while shuffle=false can only make larger partitions,
but with shuffle=true, you can break your partitions up into many
smaller partitions, with the content based on a hash partitioner.

I believe that's what you're asking for?

- Stephen



Mime
View raw message