spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuanjian Li <xyliyuanj...@gmail.com>
Subject Re: [DISCUSS] Disable streaming query with possible correctness issue by default
Date Wed, 11 Nov 2020 09:14:39 GMT
Already +1 in the PR. It would be great to mention the new config in the SS
migration guide.

Ryan Blue <rblue@netflix.com.invalid> 于2020年11月11日周三 上午7:48写道:

> +1, I agree with Tom.
>
> On Tue, Nov 10, 2020 at 3:00 PM Dongjoon Hyun <dongjoon.hyun@gmail.com>
> wrote:
>
>> +1 for Apache Spark 3.1.0.
>>
>> Bests,
>> Dongjoon.
>>
>> On Tue, Nov 10, 2020 at 6:17 AM Tom Graves <tgraves_cs@yahoo.com.invalid>
>> wrote:
>>
>>> +1 since its a correctness issue, I think its ok to change the behavior
>>> to make sure the user is aware of it and let them decide.
>>>
>>> Tom
>>>
>>> On Saturday, November 7, 2020, 01:00:11 AM CST, Liang-Chi Hsieh <
>>> viirya@gmail.com> wrote:
>>>
>>>
>>> Hi devs,
>>>
>>> In Spark structured streaming, chained stateful operators possibly
>>> produces
>>> incorrect results under the global watermark. SPARK-33259
>>> (https://issues.apache.org/jira/browse/SPARK-33259) has an example
>>> demostrating what the correctness issue could be.
>>>
>>> Currently we don't prevent users running such queries. Because the
>>> possible
>>> correctness in chained stateful operators in streaming query is not
>>> straightforward for users. From users perspective, it will possibly be
>>> considered as a Spark bug like SPARK-33259. It is also possible the worse
>>> case, users are not aware of the correctness issue and use wrong results.
>>>
>>> IMO, it is better to disable such queries and let users choose to run the
>>> query if they understand there is such risk, instead of implicitly
>>> running
>>> the query and let users to find out correctness issue by themselves.
>>>
>>> I would like to propose to disable the streaming query with possible
>>> correctness issue in chained stateful operators. The behavior can be
>>> controlled by a SQL config, so if users understand the risk and still
>>> want
>>> to run the query, they can disable the check.
>>>
>>> In the PR (https://github.com/apache/spark/pull/30210), the concern I
>>> got
>>> for now is, this changes current behavior and by default it will break
>>> some
>>> existing streaming queries. But I think it is pretty easy to disable the
>>> check with the new config. In the PR currently there is no objection but
>>> suggestion to hear more voices. Please let me know if you have some
>>> thoughts.
>>>
>>> Thanks.
>>> Liang-Chi Hsieh
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Mime
View raw message