spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan <>
Subject Re: Different watermark for different kafka partitions in Structured Streaming
Date Sat, 02 Sep 2017 02:36:14 GMT
I don't think ss now support "partitioned" watermark. and why different
partition's consumption rate vary? If the handling logic is quite
different, using different topic is a better way.

On Fri, Sep 1, 2017 at 4:59 PM, 张万新 <> wrote:

> Thanks, it's true that looser watermark can guarantee more data not be
> dropped, but at the same time more state need to be kept.   I just consider
> if there is sth like kafka-partition-aware watermark in flink in SS may be
> a better solution.
> Tathagata Das <>于2017年8月31日周四 上午9:13写道:
>> Why not set the watermark to be looser, one that works across all
>> partitions? The main usage of watermark is to drop state. If you loosen the
>> watermark threshold (e.g. from 1 hour to 10 hours), then you will keep more
>> state with older data, but you are guaranteed that you will not drop
>> important data.
>> On Wed, Aug 30, 2017 at 7:41 AM, KevinZwx <> wrote:
>>> Hi,
>>> I'm working with Structured Streaming to process logs from kafka and use
>>> watermark to handle late events. Currently the watermark is computed by
>>> (max
>>> event time seen by the engine - late threshold), and the same watermark
>>> is
>>> used for all partitions.
>>> But in production environment it happens frequently that different
>>> partition
>>> is consumed at different speed, the consumption of some partitions may be
>>> left behind, so the newest event time in these partitions may be much
>>> smaller than than the others'. In this case using the same watermark for
>>> all
>>> partitions may cause heavy data loss.
>>> So is there any way to achieve different watermark for different kafka
>>> partition or any plan to work on this?
>>> --
>>> Sent from:
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail:

View raw message