spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin" <r...@databricks.com>
Subject Re: [Discuss] Follow ANSI SQL on table insertion
Date Wed, 31 Jul 2019 16:57:36 GMT
Matt what do you mean by maximizing 3, while allowing not throwing errors when any operations
overflow? Those two seem contradicting.

On Wed, Jul 31, 2019 at 9:55 AM, Matt Cheah < mcheah@palantir.com > wrote:

> 
> 
> 
> I’m -1, simply from disagreeing with the premise that we can afford to not
> be maximal on standard 3. The correctness of the data is non-negotiable,
> and whatever solution we settle on cannot silently adjust the user’s data
> under any circumstances.
> 
> 
> 
>  
> 
> 
> 
> I think the existing behavior is fine, or perhaps the behavior can be
> flagged by the destination writer at write time.
> 
> 
> 
>  
> 
> 
> 
> -Matt Cheah
> 
> 
> 
>  
> 
> 
> 
> *From:* Hyukjin Kwon < gurwls223@gmail.com >
> *Date:* Monday, July 29, 2019 at 11:33 PM
> *To:* Wenchen Fan < cloud0fan@gmail.com >
> *Cc:* Russell Spitzer < russell.spitzer@gmail.com >, Takeshi Yamamuro < linguin.m.s@gmail.com
> >, Gengliang Wang < gengliang.wang@databricks.com >, Ryan Blue < rblue@netflix.com
> >, Spark dev list < dev@spark.apache.org >
> *Subject:* Re: [Discuss] Follow ANSI SQL on table insertion
> 
> 
> 
> 
>  
> 
> 
> 
> 
> From my look, +1 on the proposal, considering ASCI and other DBMSes in
> general.
> 
> 
> 
> 
>  
> 
> 
> 
> 2019 년 7 월 30 일 ( 화 ) 오후 3:21, Wenchen Fan < cloud0fan@gmail.com >
님이 작성 :
> 
> 
> 
> 
>> 
>> 
>> We can add a config for a certain behavior if it makes sense, but the most
>> important thing we want to reach an agreement here is: what should be the
>> default behavior?
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> 
>> Let's explore the solution space of table insertion behavior first:
>> 
>> 
>> 
>> 
>> At compile time,
>> 
>> 
>> 
>> 
>> 1. always add cast
>> 
>> 
>> 
>> 
>> 2. add cast following the ASNI SQL store assignment rule (e.g. string to
>> int is forbidden but long to int is allowed)
>> 
>> 
>> 
>> 
>> 3. only add cast if it's 100% safe
>> 
>> 
>> 
>> 
>> At runtime,
>> 
>> 
>> 
>> 
>> 1. return null for invalid operations
>> 
>> 
>> 
>> 
>> 2. throw exceptions at runtime for invalid operations
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> 
>> The standards to evaluate a solution:
>> 
>> 
>> 
>> 
>> 1. How robust the query execution is. For example, users usually don't
>> want to see the query fails midway.
>> 
>> 
>> 
>> 
>> 2. how tolerant to user queries. For example, a user would like to write
>> long values to an int column as he knows all the long values won't exceed
>> int range.
>> 
>> 
>> 
>> 
>> 3. How clean the result is. For example, users usually don't want to see
>> silently corrupted data (null values).
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> 
>> The current Spark behavior for Data Source V1 tables: always add cast and
>> return null for invalid operations. This maximizes standard 1 and 2, but
>> the result is least clean and users are very likely to see silently
>> corrupted data (null values).
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> 
>> The current Spark behavior for Data Source V2 tables (new in Spark 3.0):
>> only add cast if it's 100% safe. This maximizes standard 1 and 3, but many
>> queries may fail to compile, even if these queries can run on other SQL
>> systems. Note that, people can still see silently corrupted data because
>> cast is not the only one that can return corrupted data. Simple operations
>> like ADD can also return corrected data if overflow happens. e.g. INSERT
>> INTO t1 (intCol) SELECT anotherIntCol + 100 FROM t2 
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> 
>> The proposal here: add cast following ANSI SQL store assignment rule, and
>> return null for invalid operations. This maximizes standard 1, and also
>> fits standard 2 well: if a query can't compile in Spark, it usually can't
>> compile in other mainstream databases as well. I think that's tolerant
>> enough. For standard 3, this proposal doesn't maximize it but can avoid
>> many invalid operations already.
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> 
>> Technically we can't make the result 100% clean at compile-time, we have
>> to handle things like overflow at runtime. I think the new proposal makes
>> more sense as the default behavior.
>> 
>> 
>> 
>> 
>>   
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> On Mon, Jul 29, 2019 at 8:31 PM Russell Spitzer < russell.spitzer@gmail.com
>> > wrote:
>> 
>> 
>> 
>>> 
>>> 
>>> I understand spark is making the decisions, i'm say the actual final
>>> effect of the null decision would be different depending on the insertion
>>> target if the target has different behaviors for null.
>>> 
>>> 
>>> 
>>> 
>>>  
>>> 
>>> 
>>> 
>>> On Mon, Jul 29, 2019 at 5:26 AM Wenchen Fan < cloud0fan@gmail.com > wrote:
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> 
>>>> > I'm a big -1 on null values for invalid casts.
>>>> 
>>>> 
>>>> 
>>>>  
>>>> 
>>>> 
>>>> 
>>>> 
>>>> This is why we want to introduce the ANSI mode, so that invalid cast fails
>>>> at runtime. But we have to keep the null behavior for a while, to keep
>>>> backward compatibility. Spark returns null for invalid cast since the
>>>> first day of Spark SQL, we can't just change it without a way to restore
>>>> to the old behavior.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>  
>>>> 
>>>> 
>>>> 
>>>> 
>>>> I'm OK with adding a strict mode for the upcast behavior in table
>>>> insertion, but I don't agree with making it the default. The default
>>>> behavior should be either the ANSI SQL behavior or the legacy Spark
>>>> behavior.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>  
>>>> 
>>>> 
>>>> 
>>>> 
>>>> > other modes should be allowed only with strict warning the behavior
will
>>>> be determined by the underlying sink.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>  
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Seems there is some misunderstanding. The table insertion behavior is
>>>> fully controlled by Spark. Spark decides when to add cast and Spark
>>>> decided whether invalid cast should return null or fail. The sink is only
>>>> responsible for writing data, not the type coercion/cast stuff.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>  
>>>> 
>>>> 
>>>> 
>>>> On Sun, Jul 28, 2019 at 12:24 AM Russell Spitzer < russell.spitzer@gmail.com
>>>> > wrote:
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> I'm a big -1 on null values for invalid casts. This can lead to a lot
of
>>>>> even more unexpected errors and runtime behavior since null is 
>>>>> 
>>>>> 
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>> 
>>>>> 1. Not allowed in all schemas (Leading to a runtime error anyway)
>>>>> 2. Is the same as delete in some systems (leading to data loss)
>>>>> 
>>>>> And this would be dependent on the sink being used. Spark won't just
be
>>>>> interacting with ANSI compliant sinks so I think it makes much more sense
>>>>> to be strict. I think Upcast mode is a sensible default and other modes
>>>>> should be allowed only with strict warning the behavior will be determined
>>>>> by the underlying sink.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Jul 27, 2019 at 8:05 AM Takeshi Yamamuro < linguin.m.s@gmail.com
>
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Hi, all
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> +1 for implementing this new store cast mode.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From a viewpoint of DBMS users, this cast is pretty common for INSERTs
and
>>>>>> I think this functionality could
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> promote migrations from existing DBMSs to Spark. 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> The most important thing for DBMS users is that they could optionally
>>>>>> choose this mode when inserting data.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Therefore, I think it might be okay that the two modes (the current
upcast
>>>>>> mode and the proposed store cast mode)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> co-exist for INSERTs. (There is a room to discuss which mode  is
enabled
>>>>>> by default though...)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> IMHO we'll provide three behaviours below for INSERTs;
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  - upcast mode
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  - ANSI store cast mode and runtime exceptions thrown for invalid
values
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  - ANSI store cast mode and null filled for invalid values
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sat, Jul 27, 2019 at 8:03 PM Gengliang Wang < gengliang.wang@databricks.com
>>>>>> > wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Ryan,
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks for the suggestions on the proposal and doc.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Currently, there is no data type validation in table insertion
of V1. We
>>>>>>> are on the same page that we should improve it. But using UpCast
is from
>>>>>>> one extreme to another. It is possible that many queries are
broken after
>>>>>>> upgrading to Spark 3.0. 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> The rules of UpCast are too strict. E.g. it doesn't allow assigning
>>>>>>> Timestamp type to Date Type, as there will be "precision loss".
To me, the
>>>>>>> type coercion is reasonable and the "precision loss" is under
expectation.
>>>>>>> This is very common in other SQL engines. 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> As long as Spark is following the ANSI SQL store assignment rules,
it is
>>>>>>> users' responsibility to take good care of the type coercion
in data
>>>>>>> writing. I think it's the right decision.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> > But the new behavior is only applied in DataSourceV2, so
it won’t affect
>>>>>>> existing jobs until sources move to v2 and break other behavior
anyway.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Eventually, most sources are supposed to be migrated to DataSourceV2
V2. I
>>>>>>> think we can discuss and make a decision now.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> > Fixing the silent corruption by adding a runtime exception
is not a good
>>>>>>> option, either. 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> The new optional mode proposed in https://issues.apache.org/jira/browse/SPARK-28512
>>>>>>> [issues.apache.org] (
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D28512&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=PcMDvxQsrOa4XX4-xjnVwG6ikOifZ3wN1HMMZNx7hfU&s=kluCSIsFxJd_F1ljwRLu_dVRBvOYekEogDH4KWhfkkA&e=
>>>>>>> ) is disabled by default. This should be fine.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Jul 27, 2019 at 10:23 AM Wenchen Fan < cloud0fan@gmail.com
> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I don't agree with handling literal values specially. Although
Postgres
>>>>>>>> does it, I can't find anything about it in the SQL standard.
And it
>>>>>>>> introduces inconsistent behaviors which may be strange to
users:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> * What about something like "INSERT INTO t SELECT float_col
+ 1.1"?
>>>>>>>> * The same insert with a decimal column as input will fail
even when a
>>>>>>>> decimal literal would succeed
>>>>>>>> * Similar insert queries with "literal" inputs can be constructed
through
>>>>>>>> layers of indirection via views, inline views, CTEs, unions,
etc. Would
>>>>>>>> those decimals be treated as columns and fail or would we
attempt to make
>>>>>>>> them succeed as well? Would users find this behavior surprising?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Silently corrupt data is bad, but this is the decision we
made at the
>>>>>>>> beginning when design Spark behaviors. Whenever an error
occurs, Spark
>>>>>>>> attempts to return null instead of runtime exception. Recently
we provide
>>>>>>>> configs to make Spark fail at runtime for overflow, but that's
another
>>>>>>>> story. Silently corrupt data is bad, runtime exception is
bad, and
>>>>>>>> forbidding all the table insertions that may fail(even with
very little
>>>>>>>> possibility) is also bad. We have to make trade-offs. The
trade-offs we
>>>>>>>> made in this proposal are:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> * forbid table insertions that are very like to fail, at
compile time.
>>>>>>>> (things like writing string values to int column)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> * allow table insertions that are not that likely to fail.
If the data is
>>>>>>>> wrong, don't fail, insert null.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> * provide a config to fail the insertion at runtime if the
data is wrong.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> >  But the new behavior is only applied in DataSourceV2,
so it won’t
>>>>>>>> affect existing jobs until sources move to v2 and break other
behavior
>>>>>>>> anyway.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> When users write SQL queries, they don't care if a table
is backed by Data
>>>>>>>> Source V1 or V2. We should make sure the table insertion
behavior is
>>>>>>>> consistent and reasonable. Furthermore, users may even not
care if the SQL
>>>>>>>> queries are run in Spark or other RDBMS, it's better to follow
SQL
>>>>>>>> standard instead of introducing a Spark-specific behavior.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> We are not talking about a small use case like allowing writing
decimal
>>>>>>>> literal to float column, we are talking about a big goal
to make Spark
>>>>>>>> compliant to SQL standard, w.r.t. https://issues.apache.org/jira/browse/SPARK-26217
>>>>>>>> [issues.apache.org] (
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D26217&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=PcMDvxQsrOa4XX4-xjnVwG6ikOifZ3wN1HMMZNx7hfU&s=Y9ReXszbCYw7dLvhzbVJD7wYppANZpZtUkoSNJ-MJC4&e=
>>>>>>>> ) . This proposal is a sub-task of it, to make the table
insertion
>>>>>>>> behavior follow SQL standard.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, Jul 27, 2019 at 1:35 AM Ryan Blue < rblue@netflix.com
> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I don’t think this is a good idea. Following the ANSI
standard is usually
>>>>>>>>> fine, but here it would *silently corrupt data*.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> From your proposal doc, ANSI allows implicitly casting
from long to int (any
>>>>>>>>> numeric type to any other numeric type) and inserts NULL
when a value
>>>>>>>>> overflows. That would drop data values and is not safe.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Fixing the silent corruption by adding a runtime exception
is not a good
>>>>>>>>> option, either. That puts off the problem until much
of the job has
>>>>>>>>> completed, instead of catching the error at analysis
time. It is better to
>>>>>>>>> catch this earlier during analysis than to run most of
a job and then
>>>>>>>>> fail.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> In addition, part of the justification for using the
ANSI standard is to
>>>>>>>>> avoid breaking existing jobs. But the new behavior is
only applied in
>>>>>>>>> DataSourceV2, so it won’t affect existing jobs until
sources move to v2
>>>>>>>>> and break other behavior anyway.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I think that the correct solution is to go with the existing
validation
>>>>>>>>> rules that require explicit casts to truncate values.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> That still leaves the use case that motivated this proposal,
which is that
>>>>>>>>> floating point literals are parsed as decimals and fail
simple insert
>>>>>>>>> statements. We already came up with two alternatives
to fix that problem
>>>>>>>>> in the DSv2 sync and I think it is a better idea to go
with one of those
>>>>>>>>> instead of “fixing” Spark in a way that will corrupt
data or cause runtime
>>>>>>>>> failures.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>  
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Jul 25, 2019 at 9:11 AM Wenchen Fan < cloud0fan@gmail.com
> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I have heard about many complaints about the old
table insertion behavior.
>>>>>>>>>> Blindly casting everything will leak the user mistake
to a late stage of
>>>>>>>>>> the data pipeline, and make it very hard to debug.
When a user writes
>>>>>>>>>> string values to an int column, it's probably a mistake
and the columns
>>>>>>>>>> are misordered in the INSERT statement. We should
fail the query earlier
>>>>>>>>>> and ask users to fix the mistake.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> In the meanwhile, I agree that the new table insertion
behavior we
>>>>>>>>>> introduced for Data Source V2 is too strict. It may
fail valid queries
>>>>>>>>>> unexpectedly.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> In general, I support the direction of following
the ANSI SQL standard.
>>>>>>>>>> But I'd like to do it with 2 steps:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 1. only add cast when the assignment rule is satisfied.
This should be the
>>>>>>>>>> default behavior and we should provide a legacy config
to restore to the
>>>>>>>>>> old behavior.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2. fail the cast operation at runtime if overflow
happens. AFAIK Marco
>>>>>>>>>> Gaido is working on it already. This will have a
config as well and by
>>>>>>>>>> default we still return null.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> After doing this, the default behavior will be slightly
different from the
>>>>>>>>>> SQL standard (cast can return null), and users can
turn on the ANSI mode
>>>>>>>>>> to fully follow the SQL standard. This is much better
than before and
>>>>>>>>>> should prevent a lot of user mistakes. It's also
a reasonable choice to me
>>>>>>>>>> to not throw exceptions at runtime by default, as
it's usually bad for
>>>>>>>>>> long-running jobs.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Wenchen 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Thu, Jul 25, 2019 at 11:37 PM Gengliang Wang <
gengliang.wang@databricks.com
>>>>>>>>>> > wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>  
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I would like to discuss the table insertion behavior
of Spark. In the
>>>>>>>>>>> current data source V2, only UpCast is allowed
for table insertion. I
>>>>>>>>>>> think following ANSI SQL is a better idea.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> For more information, please read the Discuss:
Follow ANSI SQL on table
>>>>>>>>>>> insertion [docs.google.com] (
>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1b9nnWWbKVDRp7lpzhQS1buv1-5FlDzWIZY2ApFs5rBcGI_edit-3Fusp-3Dsharing&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=PcMDvxQsrOa4XX4-xjnVwG6ikOifZ3wN1HMMZNx7hfU&s=bLJxfpdKatSv5gaYIKVWx29RmEBKz-DTtRwuYd7lYks&e=
>>>>>>>>>>> )
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Please let me know if you have any thoughts on
this.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>  
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Gengliang
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>  
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Ryan Blue
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Software Engineer
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Netflix
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ---
>>>>>> Takeshi Yamamuro
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>
Mime
View raw message