spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vamshi Talla <>
Subject Re: repartition
Date Mon, 09 Jul 2018 02:32:51 GMT
Hi Ravi,

RDDs are always immutable, so you cannot change them, instead you create new ones by transforming
one. Repartition is a transformation, so it lazily evaluated, hence computed only when you
call an action on it.

Vamshi Talla

On Jul 8, 2018, at 12:26 PM, <<>>
<<>> wrote:


Can anyone clarify how repartition works please ?

  *   I have a DataFrame df which has only one partition:

// Returns 1

•         I repartitioned it by passing “3” and assigned it a new DataFrame newdf

val newdf = df.repartition(3)

•         newdf shows 3 as number of partitions
// Returns 3

•         df still shows 1

// Return 1

My Question is that,

  1.  How does repartition work ? Does it copy original dataframe and create X partitions
as specified by  repartition ?  If that is the case, aren’t there two copies of same data
in memory as shown in below diagram ?

Or my understanding is incorrect ?

As per executions above,  looks like there are two copies as after repartition, df still has
1 partition !!

  1.  Repartition is executed immediately or it waits for some trigger [kind of action] ?



View raw message