spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <ryanda...@gmail.com>
Subject repartition
Date Sun, 08 Jul 2018 16:26:52 GMT
Hi, 

 

Can anyone clarify how repartition works please ?

 

*	I have a DataFrame df which has only one partition:

 

// Returns 1
df.rdd.getNumPartitions



*         I repartitioned it by passing "3" and assigned it a new DataFrame
newdf


val newdf = df.repartition(3)



 

*         newdf shows 3 as number of partitions
// Returns 3
newdf.rdd.getNumPartitions



*         df still shows 1


// Return 1
df.rdd.getNumPartitions

 

My Question is that, 

 

1.	How does repartition work ? Does it copy original dataframe and
create X partitions as specified by  repartition ?  If that is the case,
aren't there two copies of same data in memory as shown in below diagram ?

Or my understanding is incorrect ?  

 

As per executions above,  looks like there are two copies as after
repartition, df still has 1 partition !!

 

2.	Repartition is executed immediately or it waits for some trigger
[kind of action] ?

 

 



 

Thanks,

Ravi


Mime
View raw message