spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
Subject Re: Question about upgrading Kafka client version
Date Fri, 10 Mar 2017 21:26:59 GMT
I did some investigation yesterday and just posted my finds in the ticket.
Please read my latest comment in https://issues.apache.org/
jira/browse/SPARK-18057

On Fri, Mar 10, 2017 at 11:41 AM, Cody Koeninger <cody@koeninger.org> wrote:

> There are existing tickets on the issues around kafka versions, e.g.
> https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten
> any committer weigh-in on direction.
>
> On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori <oscarbatori@gmail.com>
> wrote:
> > Guys,
> >
> > To change the subject from meta-voting...
> >
> > We are doing Spark Streaming against a Kafka setup, everything is pretty
> > standard, and pretty current. In particular we are using Spark 2.1, and
> > Kafka 0.10.1, with batch windows that are quite large (5-10 minutes). The
> > problem we are having is pretty well described in the following excerpt
> from
> > the Spark documentation:
> > "For possible kafkaParams, see Kafka consumer config docs. If your Spark
> > batch duration is larger than the default Kafka heartbeat session timeout
> > (30 seconds), increase heartbeat.interval.ms and session.timeout.ms
> > appropriately. For batches larger than 5 minutes, this will require
> changing
> > group.max.session.timeout.ms on the broker. Note that the example sets
> > enable.auto.commit to false, for discussion see Storing Offsets below."
> >
> > In our case "group.max.session.timeout.ms" is set to default value, and
> our
> > processing time per batch easily exceeds that value. I did some further
> > hunting around and found the following SO post:
> > "KIP-62, decouples heartbeats from calls to poll() via a background
> > heartbeat thread. This, allow for a longer processing time (ie, time
> between
> > two consecutive poll()) than heartbeat interval."
> >
> > This pretty accurately describes our scenario: effectively our per batch
> > processing time is 2-6 minutes, well within the batch window, but in
> excess
> > of the max session timeout between polls, causing the consumer to be
> kicked
> > out of the group.
> >
> > Are there any plans to move the Kafka client up to 0.10.1 and make this
> > feature available to consumers? Or have I missed some helpful
> configuration
> > that would ameliorate this problem? I recognize changing
> > "group.max.session.timeout.ms" is one solution, though it seems doing
> > heartbeat checking outside of implicitly piggy backing on polling seems
> more
> > elegant.
> >
> > -Oscar
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message