spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject [DISCUSS] Resolve ambiguous parser rule between two "create table"s
Date Mon, 16 Mar 2020 02:54:48 GMT
Hi devs,

I'd like to initiate discussion and hear the voices for resolving ambiguous
parser rule between two "create table"s being brought by SPARK-30098 [1].

Previously, "create table" parser rules were clearly distinguished via
"USING provider", which was very intuitive and deterministic. Say, DDL
query creates "Hive" table unless "USING provider" is specified,
(Please refer the parser rule in branch-2.4 [2])

After SPARK-30098, "create table" parser rules became ambiguous (please
refer the parser rule in branch-3.0 [3]) - the factors differentiating two
rules are only "ROW FORMAT" and "STORED AS" which are all defined as
"optional". Now it relies on the "order" of parser rule which end users
would have no idea to reason about, and very unintuitive.

Furthermore, undocumented rule of EXTERNAL (added in the first rule to
provide better message) brought more confusion (I've described the broken
existing query via SPARK-30436 [4]).

Personally I'd like to see two rules mutually exclusive, instead of trying
to document the difference and talk end users to be careful about their
query. I'm seeing two ways to make rules be mutually exclusive:

1. Add some identifier in create Hive table rule, like `CREATE ... "HIVE"
TABLE ...`.

pros. This is the simplest way to distinguish between two rules.
cons. This would lead end users to change their query if they intend to
create Hive table. (Given we will also provide legacy option I'm feeling
this is acceptable.)

2. Define "ROW FORMAT" or "STORED AS" as mandatory one.

pros. Less invasive for existing queries.
cons. Less intuitive, because they have been optional and now become
mandatory to fall into the second rule.

Would like to hear everyone's voices; better ideas are welcome!

Thanks,
Jungtaek Lim (HeartSaVioR)

1. SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
https://issues.apache.org/jira/browse/SPARK-30098
2.
https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
3.
https://github.com/apache/spark/blob/branch-3.0/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
4. https://issues.apache.org/jira/browse/SPARK-30436

Mime
View raw message