spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <>
Subject Re: [Discuss] Datasource v2 support for Kerberos
Date Mon, 17 Sep 2018 01:51:18 GMT
I'm +1 for this proposal: "Extend SessionConfigSupport to support passing
specific white-listed configuration values"

One goal of data source v2 API is to not depend on any high-level APIs like
SparkSession, SQLConf, etc. If users do want to access these high-level
APIs, there is a workaround: calling `SparkSession.getActive` or

In the meanwhile, I think you use case makes sense. `SessionConfigSupport`
is created for this use case but it's not powerful enough yet. I think it
should support multiple key-prefixes and white-list.

Feel free to submit a patch, and thanks for looking into it!

On Sun, Sep 16, 2018 at 2:40 PM tigerquoll <> wrote:

> The current V2 Datasource API provides support for querying a portion of
> the
> SparkConfig namespace (spark.datasource.*) via the SessionConfigSupport
> API.
> This was designed with the assumption that all configuration information
> for
> v2 data sources should be separate from each other.
> Unfortunately, there are some cross-cutting concerns such as authentication
> that touch multiple data sources - this means that common configuration
> items need to be shared amongst multiple data sources.
> In particular, Kerberos setup can use the following configuration items:
> * userPrincipal,
> * userKeytabPath
> * krb5ConfPath
> * kerberos debugging flags
> *${service}.enabled
> * JAAS config
> * ZKServerPrincipal ??
> So potential solutions I can think of to pass this information to various
> data sources are:
> * Pass the entire SparkContext object to data sources (not likely)
> * Pass the entire SparkConfig Map object to data sources
> * Pass all required configuration via environment variables
> * Extend SessionConfigSupport to support passing specific white-listed
> configuration values
> * Add a specific data source v2 API "SupportsKerberos" so that a data
> source
> can indicate that it supports Kerberos and also provide the means to pass
> needed configuration info.
> * Expand out all Kerberos configuration items to be in each data source
> config namespace that needs it.
> If the data source requires TLS support then we also need to support
> passing
> all the  configuration values under  "spark.ssl.*"
> What do people think?  Placeholder Issue has been added at SPARK-25329.
> --
> Sent from:
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message