spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julia Wistance <julia.wista...@gmail.com>
Subject Re: Kafka Consumer Pre Fetch Messages + Async commits
Date Tue, 29 Aug 2017 14:44:45 GMT
Thanks Cody for the reply. My thoughts were that the time is anyways
required to write and commit the offsets to any of the external systems -
which are all sync.
So why not sync commit of Kafka itself to store the offsets. It helps add
another dependency on the application side to check if say MySQL is up.

Regards,
JW

On Mon, Aug 28, 2017 at 10:38 PM, Cody Koeninger <cody@koeninger.org> wrote:

> 1. No, prefetched message offsets aren't exposed.
>
> 2. No, I'm not aware of any plans for sync commit, and I'm not sure
> that makes sense.  You have to be able to deal with repeat messages in
> the event of failure in any case, so the only difference sync commit
> would make would be (possibly) slower run time.
>
> On Sat, Aug 26, 2017 at 1:07 AM, Julia Wistance
> <julia.wistance@gmail.com> wrote:
> > Hi Experts,
> >
> > A question on what could potentially happen with Spark Streaming 2.2.0 +
> > Kafka. LocationStrategies says that "new Kafka consumer API will
> pre-fetch
> > messages into buffers.".
> > If we store offsets in Kafka, currently we can only use a async commits.
> >
> > So,
> > 1 - Could it happen that we commit offsets that we havent processed yet
> but
> > the kafka consumers has prefetched
> > 2 - Are there plans to support a sync commit? Although we can go for an
> > alternate store of commits like HBase / Zookeeper, MySQL etc the code
> would
> > wait till the offsets are stored in either of these systems. It would
> make
> > sense that Spark / Kafka also adds a sync commit option?
> >
> > Appreciate the reply.
> > JW
> >
>

Mime
View raw message