spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <cloud0...@gmail.com>
Subject Official support of CREATE EXTERNAL TABLE
Date Tue, 06 Oct 2020 14:06:28 GMT
Hi all,

I'd like to start a discussion thread about this topic, as it blocks an
important feature that we target for Spark 3.1: unify the CREATE TABLE SQL
syntax.

A bit more background for CREATE EXTERNAL TABLE: it's kind of a hidden
feature in Spark for Hive compatibility.

When you write native CREATE TABLE syntax such as `CREATE EXTERNAL TABLE
... USING parquet`, the parser fails and tells you that EXTERNAL can't be
specified.

When we write Hive CREATE TABLE syntax, the EXTERNAL can be specified if
LOCATION clause or path option is present. For example, `CREATE EXTERNAL
TABLE ... STORED AS parquet` is not allowed as there is no LOCATION clause
or path option. This is not 100% Hive compatible.

As we are unifying the CREATE TABLE SQL syntax, one problem is how to deal
with CREATE EXTERNAL TABLE. We can keep it as a hidden feature as it was,
or we can officially support it.

Please let us know your thoughts:
1. As an end-user, what do you expect CREATE EXTERNAL TABLE to do? Have you
used it in production before? For what use cases?
2. As a catalog developer, how are you going to implement EXTERNAL TABLE?
It seems to me that it only makes sense for file source, as the table
directory can be managed. I'm not sure how to interpret EXTERNAL in
catalogs like jdbc, cassandra, etc.

For more details, please refer to the long discussion in
https://github.com/apache/spark/pull/28026

Thanks,
Wenchen

Mime
View raw message