spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: Structured Streaming Kafka change maxOffsetsPerTrigger won't apply
Date Wed, 20 Nov 2019 11:23:24 GMT
Hi Roland,

Not much shared apart from it's not working. Latest partition offset is
used when the size of a TopicPartition is negative.
This can be found out by checking the following log entry in the logs:

logDebug(s"rateLimit $tp size is $size")

If you've double checked and still think it's an issue please file a jira
and attach Spark configuration + logs.

BR,
G


On Wed, Nov 20, 2019 at 9:33 AM Roland Johann
<roland.johann@phenetic.io.invalid> wrote:

> Hi All,
>
> changing maxOffsetsPerTrigger and restarting the job won’t apply to the
> batch size. This is somehow bad as we currently use a trigger duration of
> 5minutes which consumes only 100k messages with an offset lag in the
> billions. Decreasing trigger duration affects also micro batch size - but
> its then only a few hundreds. Spark version in use is 2.4.4.
>
> I assume that spark uses previous micro batch sizes and runtimes to
> somehow calculate current batch sizes based on trigger durations. AFAIK
> structured streaming isn’t back pressure aware, so this behavior is strange
> on multiple levels.
>
> Any help appreciated.
>
> Kind Regards
> Roland
>

Mime
View raw message