spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Question about upgrading Kafka client version
Date Fri, 10 Mar 2017 19:41:27 GMT
There are existing tickets on the issues around kafka versions, e.g.
https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten
any committer weigh-in on direction.

On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori <oscarbatori@gmail.com> wrote:
> Guys,
>
> To change the subject from meta-voting...
>
> We are doing Spark Streaming against a Kafka setup, everything is pretty
> standard, and pretty current. In particular we are using Spark 2.1, and
> Kafka 0.10.1, with batch windows that are quite large (5-10 minutes). The
> problem we are having is pretty well described in the following excerpt from
> the Spark documentation:
> "For possible kafkaParams, see Kafka consumer config docs. If your Spark
> batch duration is larger than the default Kafka heartbeat session timeout
> (30 seconds), increase heartbeat.interval.ms and session.timeout.ms
> appropriately. For batches larger than 5 minutes, this will require changing
> group.max.session.timeout.ms on the broker. Note that the example sets
> enable.auto.commit to false, for discussion see Storing Offsets below."
>
> In our case "group.max.session.timeout.ms" is set to default value, and our
> processing time per batch easily exceeds that value. I did some further
> hunting around and found the following SO post:
> "KIP-62, decouples heartbeats from calls to poll() via a background
> heartbeat thread. This, allow for a longer processing time (ie, time between
> two consecutive poll()) than heartbeat interval."
>
> This pretty accurately describes our scenario: effectively our per batch
> processing time is 2-6 minutes, well within the batch window, but in excess
> of the max session timeout between polls, causing the consumer to be kicked
> out of the group.
>
> Are there any plans to move the Kafka client up to 0.10.1 and make this
> feature available to consumers? Or have I missed some helpful configuration
> that would ameliorate this problem? I recognize changing
> "group.max.session.timeout.ms" is one solution, though it seems doing
> heartbeat checking outside of implicitly piggy backing on polling seems more
> elegant.
>
> -Oscar
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message