spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: Setting spark.sql.shuffle.partitions Dynamically
Date Wed, 27 Jul 2016 07:36:52 GMT
Hi,

How about trying adaptive execution in spark?
https://issues.apache.org/jira/browse/SPARK-9850
This feature is turned off by default because it seems experimental.

// maropu



On Wed, Jul 27, 2016 at 3:26 PM, Brandon White <bwwinthehouse@gmail.com>
wrote:

> Hello,
>
> My platform runs hundreds of Spark jobs every day each with its own
> datasize from 20mb to 20TB. This means that we need to set resources
> dynamically. One major pain point for doing this is
> spark.sql.shuffle.partitions, the number of partitions to use when
> shuffling data for joins or aggregations. It is to be arbitrarily hard
> coded to 200. The only way to set this config is in the spark submit
> command or in the SparkConf before the executor is created.
>
> This creates a lot of problems when I want to set this config dynamically
> based on the in memory size of a dataframe. I only know the in memory size
> of the dataframe halfway through the spark job. So I would need to stop the
> context and recreate it in order to set this config.
>
> Is there any better way to set this? How
> does  spark.sql.shuffle.partitions work differently than .repartition?
>
> Brandon
>



-- 
---
Takeshi Yamamuro

Mime
View raw message