spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gengliang Wang <gengliang.w...@databricks.com>
Subject Re: [DISCUSS] naming policy of Spark configs
Date Thu, 13 Feb 2020 00:30:08 GMT
+1, this is really helpful. We should make the SQL configurations
consistent and more readable.

On Wed, Feb 12, 2020 at 3:33 PM Rubén Berenguel <rberenguel@gmail.com>
wrote:

> I love it, it will make configs easier to read and write. Thanks Wenchen.
>
> R
>
> On 13 Feb 2020, at 00:15, Dongjoon Hyun <dongjoon.hyun@gmail.com> wrote:
>
> 
> Thank you, Wenchen.
>
> The new policy looks clear to me. +1 for the explicit policy.
>
> So, are we going to revise the existing conf names before 3.0.0 release?
>
> Or, is it applied to new up-coming configurations from now?
>
> Bests,
> Dongjoon.
>
> On Wed, Feb 12, 2020 at 7:43 AM Wenchen Fan <cloud0fan@gmail.com> wrote:
>
>> Hi all,
>>
>> I'd like to discuss the naming policy of Spark configs, as for now it
>> depends on personal preference which leads to inconsistent namings.
>>
>> In general, the config name should be a noun that describes its meaning
>> clearly.
>> Good examples:
>> spark.sql.session.timeZone
>> spark.sql.streaming.continuous.executorQueueSize
>> spark.sql.statistics.histogram.numBins
>> Bad examples:
>> spark.sql.defaultSizeInBytes (default size for what?)
>>
>> Also note that, config name has many parts, joined by dots. Each part is
>> a namespace. Don't create namespace unnecessarily.
>> Good example:
>> spark.sql.execution.rangeExchange.sampleSizePerPartition
>> spark.sql.execution.arrow.maxRecordsPerBatch
>> Bad examples:
>> spark.sql.windowExec.buffer.in.memory.threshold ("in" is not a useful
>> namespace, better to be .buffer.inMemoryThreshold)
>>
>> For a big feature, usually we need to create an umbrella config to turn
>> it on/off, and other configs for fine-grained controls. These configs
>> should share the same namespace, and the umbrella config should be named
>> like featureName.enabled. For example:
>> spark.sql.cbo.enabled
>> spark.sql.cbo.starSchemaDetection
>> spark.sql.cbo.starJoinFTRatio
>> spark.sql.cbo.joinReorder.enabled
>> spark.sql.cbo.joinReorder.dp.threshold (BTW "dp" is not a good namespace)
>> spark.sql.cbo.joinReorder.card.weight (BTW "card" is not a good
>> namespace)
>>
>> For boolean configs, in general it should end with a verb, e.g.
>> spark.sql.join.preferSortMergeJoin. If the config is for a feature and
>> you can't find a good verb for the feature, featureName.enabled is also
>> good.
>>
>> I'll update https://spark.apache.org/contributing.html after we reach a
>> consensus here. Any comments are welcome!
>>
>> Thanks,
>> Wenchen
>>
>>
>>

Mime
View raw message