kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hisham Mardam-Bey <his...@mate1inc.com>
Subject Re: Exactly once semantics
Date Fri, 09 Dec 2011 02:54:03 GMT
On Thu, Dec 8, 2011 at 9:36 PM, Neha Narkhede <neha.narkhede@gmail.com> wrote:
> Hisham,
>
>>> Does this mean that if autocommit.enable is set to true then calling
> commitOffsets() does nothing?
>
> No. It means that in addition to the automatic offset commit, you will ask
> the consumer to commit offsets when you want.

Perfect.

>>> My goal is to signal the consumer and
> ask it to stop consuming / processing messages, call commitOffsets(),
>
> When you call commitOffsets, ONLY offsets for the messages returned by the
> consumer iterator will be committed. It will
> not prematurely commit data that you haven't consumed.

Fantastic as well.

Thanks again!

hmb.

> Thanks,
> Neha
>
> On Thu, Dec 8, 2011 at 6:29 PM, Hisham Mardam-Bey <hisham@mate1inc.com>wrote:
>
>> On Thu, Dec 8, 2011 at 3:47 PM, Neha Narkhede <neha.narkhede@gmail.com>
>> wrote:
>> > Evan,
>> >
>> > Please look at autocommit.enable at
>> > http://incubator.apache.org/kafka/configuration.html
>> > If it is false, you can control the offset storage via the commitOffsets
>> > API call.
>>
>> Does this mean that if autocommit.enable is set to true then calling
>> commitOffsets() does nothing? My goal is to signal the consumer and
>> ask it to stop consuming / processing messages, call commitOffsets(),
>> then shut down the consumer. Would this work or should I have to worry
>> about what has been pulled from the broker (in batch and maybe sitting
>> in a buffer) but the consumer has not consumed yet as well?
>>
>> Thanks,
>>
>> hmb.
>>
>> >>> So, commit the offset when you have an ack, however that is defined;
>> > Rollback to an earlier offset when you don't get acks,
>> > and de-dup as necessary.
>> >
>> > Sounds like you can use commitOffsets() right after getting an ack.
>> >
>> > Thanks,
>> > Neha
>> >
>> > On Thu, Dec 8, 2011 at 12:44 PM, Evan Chan <ev@ooyala.com> wrote:
>> >
>> >> What you mean is that we need to modify (have our own modified copy of)
>> the
>> >> high level consumer (specifically the ConsumerConnector) so that
>> instead of
>> >> automatically calling commitOffset(),  we can call commitOffset() at our
>> >> own discretion, when we know that the messages have gotten to their
>> >> destination.
>> >>
>> >> I am planning to do this BTW for a similar use case.
>> >> Exactly once == at least once + de-duplication.
>> >> So, commit the offset when you have an ack, however that is defined;
>> >> Rollback to an earlier offset when you don't get acks,
>> >> and de-dup as necessary.
>> >>
>> >> -Evan
>> >>
>> >>
>> >> On Thu, Dec 8, 2011 at 10:03 AM, Jun Rao <junrao@gmail.com> wrote:
>> >>
>> >> > Neha is right. It's possible to achieve exactly-once delivery even
in
>> >> high
>> >> > level consumer. What you have to do is do make sure all consumed
>> messages
>> >> > are really consumed and then call commitOffset. When you call
>> >> commitOffset,
>> >> > all messages returned to the apps should have been fully consumed or
>> put
>> >> in
>> >> > a safe place.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jun
>> >> >
>> >> > On Thu, Dec 8, 2011 at 9:52 AM, Neha Narkhede <
>> neha.narkhede@gmail.com
>> >> > >wrote:
>> >> >
>> >> > > Mark,
>> >> > >
>> >> > > >> Is that correct? Did you mean SimpleConsumer or
>> HighLevelConsumer?
>> >> > What
>> >> > > are the differences?
>> >> > >
>> >> > > The high level consumer check points the offsets in zookeeper,
>> either
>> >> > > periodically or based on an API call (look at commitOffsets()).
>> >> > >
>> >> > > If you want to checkpoint each and every message offset,
>> exactly-once
>> >> > > semantics will be expensive. But if you are willing to tolerate
a
>> small
>> >> > > window of duplicates, you could buffer and write the offsets in
>> >> batches.
>> >> > > If you choose to do the former, commitOffsets() approach is
>> expensive,
>> >> > > since that can lead to too many writes on zookeeper. If you choose
>> the
>> >> > > later, it could be fine, and you can use the high level consumer
>> >> itself.
>> >> > >
>> >> > > On the contrary, if your consumer is writing the messages to some
>> >> > database
>> >> > > or persistent storage, you might be better off using SimpleConsumer.
>> >> > There
>> >> > > was another discussion about making the offset storage of the
high
>> >> level
>> >> > > consumer pluggable, but we don't have that feature yet.
>> >> > >
>> >> > > Thanks,
>> >> > > Neha
>> >> > >
>> >> > >
>> >> > > On Thu, Dec 8, 2011 at 9:32 AM, Jun Rao <junrao@gmail.com>
wrote:
>> >> > >
>> >> > > > Currently, the high level consumer (with ZK integration)
doesn't
>> >> expose
>> >> > > > offsets to the consumer. Only SimpleConsumer does.
>> >> > > >
>> >> > > > Jun
>> >> > > >
>> >> > > > On Thu, Dec 8, 2011 at 9:15 AM, Mark <static.void.dev@gmail.com>
>> >> > wrote:
>> >> > > >
>> >> > > > > "This is only possible through SimpleConsumer right
now."
>> >> > > > >
>> >> > > > >
>> >> > > > > Is that correct? Did you mean SimpleConsumer or
>> HighLevelConsumer?
>> >> > What
>> >> > > > > are the differences?
>> >> > > > >
>> >> > > > >
>> >> > > > > On 12/8/11 8:53 AM, Jun Rao wrote:
>> >> > > > >
>> >> > > > >> Mark,
>> >> > > > >>
>> >> > > > >> Today, this is mostly the responsibility of the
consumer, by
>> >> > managing
>> >> > > > the
>> >> > > > >> offsets properly. For example, if the consumer periodically
>> >> flushes
>> >> > > > >> messages to disk, it has to checkpoint to disk the
offset
>> >> > > corresponding
>> >> > > > to
>> >> > > > >> the last flush. On failure, the consumer has to
rewind the
>> >> > consumption
>> >> > > > >> from
>> >> > > > >> the last checkpointed offset. This is only possible
through
>> >> > > > SimpleConsumer
>> >> > > > >> right now.
>> >> > > > >>
>> >> > > > >> Thanks,
>> >> > > > >>
>> >> > > > >> Jun
>> >> > > > >>
>> >> > > > >> On Thu, Dec 8, 2011 at 8:18 AM, Mark<static.void.dev@gmail.com
>> **>
>> >> > > >  wrote:
>> >> > > > >>
>> >> > > > >>  How can one guarantee exactly one semantics when
using Kafka
>> as a
>> >> > > > >>> traditional queue? Is this guarantee the responsibility
of the
>> >> > > > consumer?
>> >> > > > >>>
>> >> > > > >>>
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> --
>> >> *Evan Chan*
>> >> Senior Software Engineer |
>> >> ev@ooyala.com | (650) 996-4600
>> >> www.ooyala.com | blog <http://www.ooyala.com/blog> |
>> >> @ooyala<http://www.twitter.com/ooyala>
>> >>
>>



-- 
Hisham Mardam Bey
Director of Engineering | Mate1 Inc.
4200 St. Laurent Boulevard | Suite 550
Montreal, Quebec | H2W 2R2
t. +1.514.393.1414 x264

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

-=[ Codito Ergo Sum ]=-

Mime
View raw message