I'd like to call for a vote on SPARK-28885 "Follow ANSI store assignment rules in table insertion by default".
When inserting a value into a column with the different data type, Spark performs type coercion. Currently, we support 3 policies for the type coercion rules: ANSI, legacy and strict, which can be set via the option "spark.sql.storeAssignmentPolicy":
1. ANSI: Spark performs the type coercion as per ANSI SQL. In practice, the behavior is mostly the same as PostgreSQL. It disallows certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean`.
2. Legacy: Spark allows the type coercion as long as it is a valid `Cast`, which is very loose. E.g., converting either `string` to `int` or `double` to `boolean` is allowed. It is the current behavior in Spark 2.x for compatibility with Hive.
3. Strict: Spark doesn't allow any possible precision loss or data truncation in type coercion, e.g., converting either `double` to `int` or `decimal` to `double` is allowed. The rules are originally for Dataset encoder. As far as I know, no maintainstream DBMS is using this policy by default.
Currently, the V1 data source uses "Legacy" policy by default, while V2 uses "Strict". This proposal is to use "ANSI" policy by default for both V1 and V2 in Spark 3.0.
There was also a DISCUSS thread "Follow ANSI SQL on table insertion" in the dev mailing list.
This vote is open until next Thurs (Sept. 12nd).
[ ] +1: Accept the proposal
[ ] +0
[ ] -1: I don't think this is a good idea because ...