spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin" <r...@databricks.com>
Subject Re: [DISCUSS] naming policy of Spark configs
Date Wed, 12 Feb 2020 23:24:55 GMT
This is really cool. We should also be more opinionated about how we specify time and intervals.

On Wed, Feb 12, 2020 at 3:15 PM, Dongjoon Hyun < dongjoon.hyun@gmail.com > wrote:

> 
> Thank you, Wenchen.
> 
> 
> The new policy looks clear to me. +1 for the explicit policy.
> 
> 
> So, are we going to revise the existing conf names before 3.0.0 release?
> 
> 
> Or, is it applied to new up-coming configurations from now?
> 
> 
> Bests,
> Dongjoon.
> 
> On Wed, Feb 12, 2020 at 7:43 AM Wenchen Fan < cloud0fan@ gmail. com (
> cloud0fan@gmail.com ) > wrote:
> 
> 
>> Hi all,
>> 
>> 
>> I'd like to discuss the naming policy of Spark configs, as for now it
>> depends on personal preference which leads to inconsistent namings.
>> 
>> 
>> In general, the config name should be a noun that describes its meaning
>> clearly.
>> Good examples:
>> spark.sql.session.timeZone
>> 
>> spark.sql.streaming.continuous.executorQueueSize
>> 
>> spark.sql.statistics.histogram.numBins
>> 
>> Bad examples:
>> spark.sql.defaultSizeInBytes (default size for what?)
>> 
>> 
>> 
>> Also note that, config name has many parts, joined by dots. Each part is a
>> namespace. Don't create namespace unnecessarily.
>> Good example:
>> spark.sql.execution.rangeExchange.sampleSizePerPartition
>> 
>> spark.sql.execution.arrow.maxRecordsPerBatch
>> 
>> Bad examples:
>> spark. sql. windowExec. buffer. in. memory. threshold (
>> http://spark.sql.windowexec.buffer.in.memory.threshold/ ) (" in" is not a
>> useful namespace, better to be.buffer.inMemoryThreshold )
>> 
>> 
>> 
>> For a big feature, usually we need to create an umbrella config to turn it
>> on/off, and other configs for fine-grained controls. These configs should
>> share the same namespace, and the umbrella config should be named like featureName.enabled
>> . For example:
>> spark.sql.cbo.enabled
>> 
>> spark.sql.cbo.starSchemaDetection
>> 
>> spark.sql.cbo.starJoinFTRatio
>> spark.sql.cbo.joinReorder.enabled
>> spark.sql.cbo.joinReorder.dp.threshold (BTW "dp" is not a good namespace)
>> 
>> spark.sql.cbo.joinReorder.card.weight (BTW "card" is not a good namespace)
>> 
>> 
>> 
>> 
>> For boolean configs, in general it should end with a verb, e.g. spark.sql.join.preferSortMergeJoin
>> . If the config is for a feature and you can't find a good verb for the
>> feature, featureName.enabled is also good.
>> 
>> 
>> I'll update https:/ / spark. apache. org/ contributing. html (
>> https://spark.apache.org/contributing.html ) after we reach a consensus
>> here. Any comments are welcome!
>> 
>> 
>> Thanks,
>> Wenchen
>> 
> 
>
Mime
View raw message