kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hisham Mardam-Bey <his...@mate1inc.com>
Subject Re: Exactly once semantics
Date Fri, 09 Dec 2011 02:29:06 GMT
On Thu, Dec 8, 2011 at 3:47 PM, Neha Narkhede <neha.narkhede@gmail.com> wrote:
> Evan,
>
> Please look at autocommit.enable at
> http://incubator.apache.org/kafka/configuration.html
> If it is false, you can control the offset storage via the commitOffsets
> API call.

Does this mean that if autocommit.enable is set to true then calling
commitOffsets() does nothing? My goal is to signal the consumer and
ask it to stop consuming / processing messages, call commitOffsets(),
then shut down the consumer. Would this work or should I have to worry
about what has been pulled from the broker (in batch and maybe sitting
in a buffer) but the consumer has not consumed yet as well?

Thanks,

hmb.

>>> So, commit the offset when you have an ack, however that is defined;
> Rollback to an earlier offset when you don't get acks,
> and de-dup as necessary.
>
> Sounds like you can use commitOffsets() right after getting an ack.
>
> Thanks,
> Neha
>
> On Thu, Dec 8, 2011 at 12:44 PM, Evan Chan <ev@ooyala.com> wrote:
>
>> What you mean is that we need to modify (have our own modified copy of) the
>> high level consumer (specifically the ConsumerConnector) so that instead of
>> automatically calling commitOffset(),  we can call commitOffset() at our
>> own discretion, when we know that the messages have gotten to their
>> destination.
>>
>> I am planning to do this BTW for a similar use case.
>> Exactly once == at least once + de-duplication.
>> So, commit the offset when you have an ack, however that is defined;
>> Rollback to an earlier offset when you don't get acks,
>> and de-dup as necessary.
>>
>> -Evan
>>
>>
>> On Thu, Dec 8, 2011 at 10:03 AM, Jun Rao <junrao@gmail.com> wrote:
>>
>> > Neha is right. It's possible to achieve exactly-once delivery even in
>> high
>> > level consumer. What you have to do is do make sure all consumed messages
>> > are really consumed and then call commitOffset. When you call
>> commitOffset,
>> > all messages returned to the apps should have been fully consumed or put
>> in
>> > a safe place.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Thu, Dec 8, 2011 at 9:52 AM, Neha Narkhede <neha.narkhede@gmail.com
>> > >wrote:
>> >
>> > > Mark,
>> > >
>> > > >> Is that correct? Did you mean SimpleConsumer or HighLevelConsumer?
>> > What
>> > > are the differences?
>> > >
>> > > The high level consumer check points the offsets in zookeeper, either
>> > > periodically or based on an API call (look at commitOffsets()).
>> > >
>> > > If you want to checkpoint each and every message offset, exactly-once
>> > > semantics will be expensive. But if you are willing to tolerate a small
>> > > window of duplicates, you could buffer and write the offsets in
>> batches.
>> > > If you choose to do the former, commitOffsets() approach is expensive,
>> > > since that can lead to too many writes on zookeeper. If you choose the
>> > > later, it could be fine, and you can use the high level consumer
>> itself.
>> > >
>> > > On the contrary, if your consumer is writing the messages to some
>> > database
>> > > or persistent storage, you might be better off using SimpleConsumer.
>> > There
>> > > was another discussion about making the offset storage of the high
>> level
>> > > consumer pluggable, but we don't have that feature yet.
>> > >
>> > > Thanks,
>> > > Neha
>> > >
>> > >
>> > > On Thu, Dec 8, 2011 at 9:32 AM, Jun Rao <junrao@gmail.com> wrote:
>> > >
>> > > > Currently, the high level consumer (with ZK integration) doesn't
>> expose
>> > > > offsets to the consumer. Only SimpleConsumer does.
>> > > >
>> > > > Jun
>> > > >
>> > > > On Thu, Dec 8, 2011 at 9:15 AM, Mark <static.void.dev@gmail.com>
>> > wrote:
>> > > >
>> > > > > "This is only possible through SimpleConsumer right now."
>> > > > >
>> > > > >
>> > > > > Is that correct? Did you mean SimpleConsumer or HighLevelConsumer?
>> > What
>> > > > > are the differences?
>> > > > >
>> > > > >
>> > > > > On 12/8/11 8:53 AM, Jun Rao wrote:
>> > > > >
>> > > > >> Mark,
>> > > > >>
>> > > > >> Today, this is mostly the responsibility of the consumer,
by
>> > managing
>> > > > the
>> > > > >> offsets properly. For example, if the consumer periodically
>> flushes
>> > > > >> messages to disk, it has to checkpoint to disk the offset
>> > > corresponding
>> > > > to
>> > > > >> the last flush. On failure, the consumer has to rewind the
>> > consumption
>> > > > >> from
>> > > > >> the last checkpointed offset. This is only possible through
>> > > > SimpleConsumer
>> > > > >> right now.
>> > > > >>
>> > > > >> Thanks,
>> > > > >>
>> > > > >> Jun
>> > > > >>
>> > > > >> On Thu, Dec 8, 2011 at 8:18 AM, Mark<static.void.dev@gmail.com**>
>> > > >  wrote:
>> > > > >>
>> > > > >>  How can one guarantee exactly one semantics when using
Kafka as a
>> > > > >>> traditional queue? Is this guarantee the responsibility
of the
>> > > > consumer?
>> > > > >>>
>> > > > >>>
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> --
>> *Evan Chan*
>> Senior Software Engineer |
>> ev@ooyala.com | (650) 996-4600
>> www.ooyala.com | blog <http://www.ooyala.com/blog> |
>> @ooyala<http://www.twitter.com/ooyala>
>>

Mime
View raw message