spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <>
Subject Re: Setting spark.sql.shuffle.partitions Dynamically
Date Wed, 27 Jul 2016 07:36:52 GMT

How about trying adaptive execution in spark?
This feature is turned off by default because it seems experimental.

// maropu

On Wed, Jul 27, 2016 at 3:26 PM, Brandon White <>

> Hello,
> My platform runs hundreds of Spark jobs every day each with its own
> datasize from 20mb to 20TB. This means that we need to set resources
> dynamically. One major pain point for doing this is
> spark.sql.shuffle.partitions, the number of partitions to use when
> shuffling data for joins or aggregations. It is to be arbitrarily hard
> coded to 200. The only way to set this config is in the spark submit
> command or in the SparkConf before the executor is created.
> This creates a lot of problems when I want to set this config dynamically
> based on the in memory size of a dataframe. I only know the in memory size
> of the dataframe halfway through the spark job. So I would need to stop the
> context and recreate it in order to set this config.
> Is there any better way to set this? How
> does  spark.sql.shuffle.partitions work differently than .repartition?
> Brandon

Takeshi Yamamuro

View raw message