spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From peay <p...@protonmail.com>
Subject Configuration for unit testing and sql.shuffle.partitions
Date Tue, 12 Sep 2017 21:46:37 GMT
Hello,

I am running unit tests with Spark DataFrames, and I am looking for configuration tweaks that
would make tests faster. Usually, I use a local[2] or local[4] master.

Something that has been bothering me is that most of my stages end up using 200 partitions,
independently of whether I repartition the input. This seems a bit overkill for small unit
tests that barely have 200 rows per DataFrame.

spark.sql.shuffle.partitions used to control this I believe, but it seems to be gone and I
could not find any information on what mechanism/setting replaces it or the corresponding
JIRA.

Has anyone experience to share on how to tune Spark best for very small local runs like that?

Thanks!
Mime
View raw message