spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Re: Request to document the direct relationship between other configurations
Date Wed, 12 Feb 2020 02:22:40 GMT
Sure, adding "[DISCUSS]" is a good practice to label it. I had to do it
although it might be "redundant" :-) since anyone can give feedback to any
thread in Spark dev mailing list, and discuss.

This is actually more prevailing given my rough reading of configuration
files. I would like to see this missing relationship as a bad pattern,
started from a personal preference.

> Personally I'd rather not think someone won't understand setting
`.enabled` to `false` means the functionality is disabled and effectively
it disables all sub-configurations.
> E.g. when `spark.sql.adaptive.enabled` is `false`, all the configurations
for `spark.sql.adaptive.*` are implicitly no-op. For me this is pretty
intuitive and the one of major
> benefits of the structured configurations.

I don't think this is a good idea we assume for users to know such
contexts. One might think
`spark.sql.adaptive.shuffle.fetchShuffleBlocksInBatch.enabled` can
partially enable the feature. It is better to be explicit to document since
some of configurations are even difficult for users to confirm if it is
working or not.
For instance, one might think setting 'spark.eventLog.rolling.maxFileSize'
automatically enables rolling. Then, they realise the log is not rolling
later after the file
size becomes bigger.


2020년 2월 12일 (수) 오전 10:47, Jungtaek Lim <kabhwan.opensource@gmail.com>님이
작성:

> I'm sorry if I miss something, but this is ideally better to be started as
> [DISCUSS] as I haven't seen any reference to have consensus on this
> practice.
>
> For me it's just there're two different practices co-existing on the
> codebase, meaning it's closer to the preference of individual (with
> implicitly agreeing that others have different preferences), or it hasn't
> been discussed thoughtfully.
>
> Personally I'd rather not think someone won't understand setting
> `.enabled` to `false` means the functionality is disabled and effectively
> it disables all sub-configurations. E.g. when `spark.sql.adaptive.enabled`
> is `false`, all the configurations for `spark.sql.adaptive.*` are
> implicitly no-op. For me this is pretty intuitive and the one of major
> benefits of the structured configurations.
>
> If we want to make it explicit, "all" sub-configurations should have
> redundant part of the doc. More redundant if the condition is nested. I
> agree this is the good step of "be kind" but less pragmatic.
>
> I'd be happy to follow the consensus we would make in this thread.
> Appreciate more voices.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> On Wed, Feb 12, 2020 at 10:36 AM Hyukjin Kwon <gurwls223@gmail.com> wrote:
>
>> > I don't plan to document this officially yet
>> Just to prevent confusion, I meant I don't yet plan to document the fact
>> that we should write the relationships in configurations as a code/review
>> guideline in https://spark.apache.org/contributing.html
>>
>>
>> 2020년 2월 12일 (수) 오전 9:57, Hyukjin Kwon <gurwls223@gmail.com>님이
작성:
>>
>>> Hi all,
>>>
>>> I happened to review some PRs and I noticed that some configurations
>>> don't have some information
>>> necessary.
>>>
>>> To be explicit, I would like to make sure we document the direct
>>> relationship between other configurations
>>> in the documentation. For example,
>>> `spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled`
>>> can be only enabled when `spark.sql.adaptive.enabled` is enabled. That's
>>> clearly documented.
>>> We're good in general given that we document them in general in Apache
>>> Spark.
>>> See 'spark.task.reaper.enabled', 'spark.dynamicAllocation.enabled',
>>> 'spark.sql.parquet.filterPushdown', etc.
>>>
>>> However, I noticed such a pattern that such information is missing in
>>> some components in general, for example,
>>> `spark.history.fs.cleaner.*`, `spark.history.kerberos.*` and
>>> `spark.history.ui.acls.* `
>>>
>>> I hope we all start to document such information. Logically users can't
>>> know the relationship and I myself
>>> had to read the codes to confirm when I review.
>>> I don't plan to document this officially yet because to me it looks a
>>> pretty logical request to me; however,
>>> let me know if you guys have some different opinions.
>>>
>>> Thanks.
>>>
>>>
>>>

Mime
View raw message