spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liang-Chi Hsieh <vii...@gmail.com>
Subject Re: [DISCUSS] Disable streaming query with possible correctness issue by default
Date Wed, 11 Nov 2020 19:29:40 GMT

Thanks all for the responses!

Based on these responses, I think we can go forward with the PR. I will put
the new config in the migration guide. Please help review the PR if you have
more comments.

Thank you!


Yuanjian Li wrote
> Already +1 in the PR. It would be great to mention the new config in the
> SS
> migration guide.
> 
> Ryan Blue &lt;

> rblue@.com

> &gt; 于2020年11月11日周三 上午7:48写道:
> 
>> +1, I agree with Tom.
>>
>> On Tue, Nov 10, 2020 at 3:00 PM Dongjoon Hyun &lt;

> dongjoon.hyun@

> &gt;
>> wrote:
>>
>>> +1 for Apache Spark 3.1.0.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> On Tue, Nov 10, 2020 at 6:17 AM Tom Graves &lt;

> tgraves_cs@.com

> &gt;
>>> wrote:
>>>
>>>> +1 since its a correctness issue, I think its ok to change the behavior
>>>> to make sure the user is aware of it and let them decide.
>>>>
>>>> Tom
>>>>
>>>> On Saturday, November 7, 2020, 01:00:11 AM CST, Liang-Chi Hsieh <
>>>> 

> viirya@

>> wrote:
>>>>
>>>>
>>>> Hi devs,
>>>>
>>>> In Spark structured streaming, chained stateful operators possibly
>>>> produces
>>>> incorrect results under the global watermark. SPARK-33259
>>>> (https://issues.apache.org/jira/browse/SPARK-33259) has an example
>>>> demostrating what the correctness issue could be.
>>>>
>>>> Currently we don't prevent users running such queries. Because the
>>>> possible
>>>> correctness in chained stateful operators in streaming query is not
>>>> straightforward for users. From users perspective, it will possibly be
>>>> considered as a Spark bug like SPARK-33259. It is also possible the
>>>> worse
>>>> case, users are not aware of the correctness issue and use wrong
>>>> results.
>>>>
>>>> IMO, it is better to disable such queries and let users choose to run
>>>> the
>>>> query if they understand there is such risk, instead of implicitly
>>>> running
>>>> the query and let users to find out correctness issue by themselves.
>>>>
>>>> I would like to propose to disable the streaming query with possible
>>>> correctness issue in chained stateful operators. The behavior can be
>>>> controlled by a SQL config, so if users understand the risk and still
>>>> want
>>>> to run the query, they can disable the check.
>>>>
>>>> In the PR (https://github.com/apache/spark/pull/30210), the concern I
>>>> got
>>>> for now is, this changes current behavior and by default it will break
>>>> some
>>>> existing streaming queries. But I think it is pretty easy to disable
>>>> the
>>>> check with the new config. In the PR currently there is no objection
>>>> but
>>>> suggestion to hear more voices. Please let me know if you have some
>>>> thoughts.
>>>>
>>>> Thanks.
>>>> Liang-Chi Hsieh
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: 

> dev-unsubscribe@.apache

>>>>
>>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message