spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Spark Streaming backpressure weird behavior/bug
Date Wed, 02 Nov 2016 15:43:37 GMT
Does that batch actually have that many records in it (you should be able
to see beginning and ending offsets in the logs), or is it an error in the
UI?


On Tue, Nov 1, 2016 at 11:59 PM, map reduced <k3t.git.1@gmail.com> wrote:

> Hi guys,
>
> I am using Spark 2.0.0 standalone cluster, regular streaming job consuming
> from kafka and writing to http endpoint. I have configuration:
> executors 7 cores/executor, maxCores = 84 (so 12 executors)
> batchsize - 90 seconds
> maxRatePerPartition - 2000
> backPressure enabled = true
>
> My kafka topics have total of 300 partitions, so I am expecting to be max
> 54million records per batch (maxRatePerPartition * batchsize * #partitions)
> - and that's what I am getting. But it turns out that it can't process
> 54million records in 90sec batch, so I am expecting backpressure to kick
> in, but I see something strange there. It reduces batch size to lesser # of
> records, but then suddenly spits out a HUGE batch size of 13 billion
> records.
>
> [image: Inline image 1]
> I changed some configuration to see if above was a one off case but the
> same issue happened again. Check the below screenshot (huge batch size of
> 14 billion records again!) :
>
> [image: Inline image 2]
>
> Is this a bug? Any reasoning you know for this to happen?
>
> Thanks,
> KP
>

Mime
View raw message