spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Johann <roland.joh...@phenetic.io.INVALID>
Subject Structured Streaming Kafka change maxOffsetsPerTrigger won't apply
Date Wed, 20 Nov 2019 08:33:09 GMT
Hi All,

changing maxOffsetsPerTrigger and restarting the job won’t apply to the batch size. This
is somehow bad as we currently use a trigger duration of 5minutes which consumes only 100k
messages with an offset lag in the billions. Decreasing trigger duration affects also micro
batch size - but its then only a few hundreds. Spark version in use is 2.4.4.

I assume that spark uses previous micro batch sizes and runtimes to somehow calculate current
batch sizes based on trigger durations. AFAIK structured streaming isn’t back pressure aware,
so this behavior is strange on multiple levels.

Any help appreciated.

Kind Regards
Roland
Mime
View raw message