spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From map reduced <>
Subject Spark Streaming backpressure weird behavior/bug
Date Wed, 02 Nov 2016 04:59:07 GMT
Hi guys,

I am using Spark 2.0.0 standalone cluster, regular streaming job consuming
from kafka and writing to http endpoint. I have configuration:
executors 7 cores/executor, maxCores = 84 (so 12 executors)
batchsize - 90 seconds
maxRatePerPartition - 2000
backPressure enabled = true

My kafka topics have total of 300 partitions, so I am expecting to be max
54million records per batch (maxRatePerPartition * batchsize * #partitions)
- and that's what I am getting. But it turns out that it can't process
54million records in 90sec batch, so I am expecting backpressure to kick
in, but I see something strange there. It reduces batch size to lesser # of
records, but then suddenly spits out a HUGE batch size of 13 billion

[image: Inline image 1]
I changed some configuration to see if above was a one off case but the
same issue happened again. Check the below screenshot (huge batch size of
14 billion records again!) :

[image: Inline image 2]

Is this a bug? Any reasoning you know for this to happen?


View raw message