spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@hacked.work>
Subject Re: Configuration for unit testing and sql.shuffle.partitions
Date Sat, 16 Sep 2017 16:26:17 GMT
spark.sql.shuffle.partitions is still used I believe. I can see it in the
code
<https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L191>
and
in the documentation page
<https://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options>
.

On Wed, Sep 13, 2017 at 4:46 AM, peay <peay@protonmail.com> wrote:

> Hello,
>
> I am running unit tests with Spark DataFrames, and I am looking for
> configuration tweaks that would make tests faster. Usually, I use a
> local[2] or local[4] master.
>
> Something that has been bothering me is that most of my stages end up
> using 200 partitions, independently of whether I repartition the input.
> This seems a bit overkill for small unit tests that barely have 200 rows
> per DataFrame.
>
> spark.sql.shuffle.partitions used to control this I believe, but it seems
> to be gone and I could not find any information on what mechanism/setting
> replaces it or the corresponding JIRA.
>
> Has anyone experience to share on how to tune Spark best for very small
> local runs like that?
>
> Thanks!
>
>


-- 
Cheers!

Mime
View raw message