spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <>
Subject Re: [Discuss] Datasource v2 support for Kerberos
Date Mon, 24 Sep 2018 17:35:59 GMT
Dale, what do you think about the option that I suggested? I think that's
different from the ones that you just listed.

Basically, the idea is to have a "shared" set of options that are passed to
all sources. This would not be a whitelist, it would be a namespace that
ends up passed in everywhere. That way, kerberos options would be set in
the shared space, but could be set directly if you want to override.

The problem I have with your option 1 is that it requires a whiltelist,
which is difficult to maintain and doesn't have obvious behavior. If a user
wants to share an option, it has to be a special one. Otherwise the user
has to wait until we add it to a whitelist, which is slow.

I don't think your option 2 works because that's no better than what we do
today. And as you said, isolating config is a good goal.

Your option 3 is basically a whitelist, but with additional interfaces to
activate the option sets to forward. I think that's a bit too intrusive and
shares the problems that a whitelist has.

The option I'm proposing gets around those issues because it is obvious
what is happening. Any option under the shared namespace is copied to all
sources and catalogs. That doesn't require Spark to do anything to support
specific sets of options and is predictable behavior for users to
understand. It also allows us to maintain separation instead of passing all
options. I think this is a good option overall.

What do you think?


On Sun, Sep 23, 2018 at 5:21 PM tigerquoll <> wrote:

> I believe the current spark config system is unfortunate in the way it has
> grown - you have no way of telling which sub-systems uses which
> configuration options without direct and detailed reading of the code.
> Isolating config items for datasources into a separate namespaces (rather
> then using a whitelist), is a nice idea - unfortunately in this case we are
> dealing with configuration items that have been exposed to end-users in
> their current from for a significant amount of time, and Kerberos
> cross-cuts
> not only datasources, but also things like YARN.
> So given that fact - the best options of a way forward I can think of are:
> 1. Whitelisting of specific sub sections of the configuration space, or
> 2. Just pass in a Map[String,String] of all config values
> 3. Implement a specific interface for data sources to indicate/implement
> Kerberos support
> Option (1), is pretty arbitrary, and more then likely the whitelist will
> change from version to version as additional items get added to it.  Data
> sources will develop dependencies on certain configuration values being
> present in the white list.
> Option (2) would work, but continues the practice of having a vaguely
> specified grab-bag of config items as a dependency for practically all
> Spark
> code.
> I am beginning to to warm to option (3), it would be a clean way of
> declaring that a data source supports Kerberos, and also a cleanly
> specified
> way of injecting the relevant Kerberos configuration information into the
> data source - and we will not need to change any user-facing configuration
> items as well.
> --
> Sent from:
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

Ryan Blue
Software Engineer

View raw message