spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject [DISCUSS] Syntax for table DDL
Date Fri, 28 Sep 2018 22:46:30 GMT
Hi everyone,

I’m currently working on new table DDL statements for v2 tables. For
context, the new logical plans for DataSourceV2 require a catalog interface
so that Spark can create tables for operations like CTAS. The proposed
TableCatalog API also includes an API for altering those tables so we can
make ALTER TABLE statements work. I’m implementing those DDL statements,
which will make it into upstream Spark when the TableCatalog PR is merged.

Since I’m adding new SQL statements that don’t yet exist in Spark, I want
to make sure that the syntax I’m using in our branch will match the syntax
we add to Spark later. I’m basing this proposed syntax on PostgreSQL
<https://www.postgresql.org/docs/current/static/ddl-alter.html>.

   - *Update data type*: ALTER TABLE tableIdentifier ALTER COLUMN
   qualifiedName TYPE dataType.
   - *Rename column*: ALTER TABLE tableIdentifier RENAME COLUMN
   qualifiedName TO qualifiedName
   - *Drop column*: ALTER TABLE tableIdentifier DROP (COLUMN | COLUMNS)
   qualifiedNameList

A few notes:

   - Using qualifiedName in these rules allows updating nested types, like
   point.x.
   - Updates and renames can only alter one column, but drop can drop a
   list.
   - Rename can’t move types and will validate that if the TO name is
   qualified, that the prefix matches the original field.
   - I’m also changing ADD COLUMN to support adding fields to nested
   columns by using qualifiedName instead of identifier.

Please reply to this thread if you have suggestions based on a different
SQL engine or want this syntax to be different for another reason. Thanks!

rb
-- 
Ryan Blue
Software Engineer
Netflix

Mime
View raw message