spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gengliang Wang <gengliang.w...@databricks.com>
Subject [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default
Date Wed, 04 Sep 2019 05:59:37 GMT
Hi everyone,

I'd like to call for a vote on SPARK-28885
<https://issues.apache.org/jira/browse/SPARK-28885> "Follow ANSI store
assignment rules in table insertion by default".
When inserting a value into a column with the different data type,
Spark performs type coercion. Currently, we support 3 policies for the
type coercion rules: ANSI, legacy and strict, which can be set via the
option "spark.sql.storeAssignmentPolicy":
1. ANSI: Spark performs the type coercion as per ANSI SQL. In
practice, the behavior is mostly the same as PostgreSQL. It disallows
certain unreasonable type conversions such as converting `string` to
`int` and `double` to `boolean`.
2. Legacy: Spark allows the type coercion as long as it is a valid
`Cast`, which is very loose. E.g., converting either `string` to `int`
or `double` to `boolean` is allowed. It is the current behavior in
Spark 2.x for compatibility with Hive.
3. Strict: Spark doesn't allow any possible precision loss or data
truncation in type coercion, e.g., converting either `double` to `int`
or `decimal` to `double` is allowed. The rules are originally for
Dataset encoder. As far as I know, no maintainstream DBMS is using
this policy by default.

Currently, the V1 data source uses "Legacy" policy by default, while
V2 uses "Strict". This proposal is to use "ANSI" policy by default for
both V1 and V2 in Spark 3.0.

There was also a DISCUSS thread "Follow ANSI SQL on table insertion"
in the dev mailing list.

This vote is open until next Thurs (Sept. 12nd).

[ ] +1: Accept the proposal
[ ] +0
[ ] -1: I don't think this is a good idea because ...

Thank you!

Gengliang

Mime
View raw message