spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <van...@cloudera.com>
Subject Re: RFC: [SPARK-529] Create constants for known config variables.
Date Tue, 24 Jun 2014 17:17:19 GMT
Hi Matei, thanks for the comments.

On Mon, Jun 23, 2014 at 7:58 PM, Matei Zaharia <matei.zaharia@gmail.com> wrote:
> When we did the configuration pull request, we actually avoided having a big list of
defaults in one class file, because this creates a file that all the components in the project
depend on. For example, since we have some settings specific to streaming and the REPL, do
we want those settings to appear in a file that’s in “core”? It might be better to just
make sure we use constants for defaults in the code, or maybe have a separate class per project.

I'm actually following the style used in Hadoop here, but expanding it
a little bit to add type-safety too. (e.g., see DFSConfigKeys.java in
HDFS.) I sort of see your point of "everybody needs to depend on this
file", but then all the affected code already depends on SparkConf.

> The other problem with this kind of change is that it’s disruptive to all the other
ongoing patches, so I wouldn’t consider it high-priority right now. We haven’t had a ton
of problems with settings being mistyped.

Yes, large changes like this are always disruptive. But since it
mostly touches setup code, not the innards of algorithms and the like,
it shouldn't be hard to merge things.

> If you do want to do something like this though, apart from the comment above about modules,
please make sure this is not a public API. As soon as we add it to the API, it means we can’t
change or remove those config settings. I’d also suggest giving each config setting a single
name instead of having “ui”, “shuffle”, etc objects, since the chained calls to conf.ui.port.value
look somewhat confusing.

Making it not public is easy; but then how do we want to expose this
to users? It's normal right now, as you say, to hardcode the settings
in the code (or config files - which this change would not help with).
In the process of writing the p.o.c., I found cases of the same
setting with different casing in different source files, and the same
setting with different defaults. So maybe it has not affected any
users yet, but it's easy to miss things like that with the current
approach.

I don't have a good suggestion for separating the module-specific
configs; I kinda liked the "everything under SparkConf" approach and
tried to group settings in a way that made sense. Creating objects in
each module to hold the configs for those is possible, but would
probably look a bit uglier.

But in any case, this is not urgent, more of a nice thing to have in my opinion.

-- 
Marcelo

Mime
View raw message