spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cesar Flores <>
Subject Optimal way to re-partition from a single partition
Date Mon, 08 Feb 2016 22:30:45 GMT
I have a data frame which I sort using orderBy function. This operation
causes my data frame to go to a single partition. After using those
results, I would like to re-partition to a larger number of partitions.
Currently I am just doing:

val rdd = df.rdd.coalesce(100, true) //df is a dataframe with a single
partition and around 14 million records
val newDF =  hc.createDataFrame(rdd, df.schema)

This process is really slow. Is there any other way of achieving this task,
or to optimize it (perhaps tweaking a spark configuration parameter)?

Thanks a lot
Cesar Flores

View raw message