spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <>
Subject Re: Question about upgrading Kafka client version
Date Fri, 10 Mar 2017 19:41:27 GMT
There are existing tickets on the issues around kafka versions, e.g. that haven't gotten
any committer weigh-in on direction.

On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori <> wrote:
> Guys,
> To change the subject from meta-voting...
> We are doing Spark Streaming against a Kafka setup, everything is pretty
> standard, and pretty current. In particular we are using Spark 2.1, and
> Kafka 0.10.1, with batch windows that are quite large (5-10 minutes). The
> problem we are having is pretty well described in the following excerpt from
> the Spark documentation:
> "For possible kafkaParams, see Kafka consumer config docs. If your Spark
> batch duration is larger than the default Kafka heartbeat session timeout
> (30 seconds), increase and
> appropriately. For batches larger than 5 minutes, this will require changing
> on the broker. Note that the example sets
> to false, for discussion see Storing Offsets below."
> In our case "" is set to default value, and our
> processing time per batch easily exceeds that value. I did some further
> hunting around and found the following SO post:
> "KIP-62, decouples heartbeats from calls to poll() via a background
> heartbeat thread. This, allow for a longer processing time (ie, time between
> two consecutive poll()) than heartbeat interval."
> This pretty accurately describes our scenario: effectively our per batch
> processing time is 2-6 minutes, well within the batch window, but in excess
> of the max session timeout between polls, causing the consumer to be kicked
> out of the group.
> Are there any plans to move the Kafka client up to 0.10.1 and make this
> feature available to consumers? Or have I missed some helpful configuration
> that would ameliorate this problem? I recognize changing
> "" is one solution, though it seems doing
> heartbeat checking outside of implicitly piggy backing on polling seems more
> elegant.
> -Oscar

To unsubscribe e-mail:

View raw message