kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Smith <esm...@stardotstar.org>
Subject Re: Consumer State Description in design.html
Date Fri, 13 Apr 2012 19:01:37 GMT
Ack!  No!  I'm sorry, I'm probably just confusing the issue.  I just
want to clarify the docs, not change the functionality.

Maybe I'll try to sum it up the way I would write the jira:

"Design.html is confusing to new users when it comes to where offset
data is stored by consumers."


On Fri, Apr 13, 2012 at 2:48 PM, Jun Rao <junrao@gmail.com> wrote:
> Ed,
>
> It seems that you are proposing a pluggable consumer offset store. We don't
> have that now. Could you open a jira for that?
>
> Thanks,
>
> Jun
>
> On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <esmith@stardotstar.org>wrote:
>
>> Jun,
>>
>> Let me try to rephrase to see if I can get this point across more clearly.
>>
>> I've been exploring the design by running the console tools.  The
>> console consumer stores offset data in ZK.   This appears to be the
>> default behavior in a Kafka deployment.  For example, if you skip down
>> to "Consumers and Consumer Groups", it says that offsets are stored in
>> ZK.
>>
>> This paragraph that I want to change, is basically describing an
>> alternative technique of tracking offsets.  It has been confusing to
>> me as I've tried to understand the design of Kafka, so I want to see
>> if we can clarify it somehow.
>>
>> Ed
>>
>> On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <junrao@gmail.com> wrote:
>> > Ed,
>> >
>> > The design page only describes how the high level consumer (which most
>> > people use) works. The high level consumer currently doesn't expose
>> > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is
>> not
>> > described. We can have a wiki describing it and put your content there.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <esmith@stardotstar.org
>> >wrote:
>> >
>> >> Sorry.... here it is with more clarity:
>> >>
>> >> Basically I'm adding to the beginning of the 2nd section titled
>> "Consumer
>> >> State"
>> >>
>> >> ----------------------------------------
>> >> <h3>Consumer State</h3> (the second heading like this in the
file)
>> >> <p>
>> >> In Kafka, the consumers are responsible for maintaining state
>> >> information on what has been  consumed.  The core Kafka consumers
>> >> write their state data to zookeeper.
>> >> </p>
>> >> <p>
>> >> However, it may be beneficial for consumers to write state data into
>> >> the same datastore where they are writing the results of their
>> >> processing.  For example, the consumer may simply be entering some
>> >> aggregate value into a centralized......
>> >> ..
>> >> (rest of section remains the same from here)
>> >> ..
>> >> </p>
>> >> ------------------------------------------
>> >>
>> >>
>> >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <junrao@gmail.com> wrote:
>> >> > Ed,
>> >> >
>> >> > I don't see the change you want to make. Apache mailing list doesn't
>> take
>> >> > attachments. If you have attachments, the easiest way is probably to
>> >> attach
>> >> > that to a jira.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jun
>> >> >
>> >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <
>> esmith@stardotstar.org
>> >> >wrote:
>> >> >
>> >> >> I didn't want to open up a bug unless there was some concurrence
on
>> >> >> this.   Please review the change below and see if I'm just
>> >> >> misunderstanding things or not.  This paragraph in the doc took
me a
>> >> >> long time to digest because it was describing the contrib/hadoop
>> >> >> consumer and not how simpleconsumer or consoleconsumer work:
>> >> >>
>> >> >> Consumer State (the second heading like this in the file)
>> >> >>
>> >> >> In Kafka, the consumers are responsible for maintaining state
>> >> >> information on what has been consumed.  The core Kafka consumers
>> write
>> >> >> their state data to zookeeper.
>> >> >>
>> >> >> However, it may be beneficial for consumers to write state data
into
>> >> >> the same datastore where they are writing the results of their
>> >> >> processing.  For example, the consumer may simply be entering
some
>> >> >> aggregate value into a centralized...... (rest of section remains
the
>> >> >> same from here)
>> >> >>
>> >> >> Ed
>> >> >>
>> >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <junrao@gmail.com>
wrote:
>> >> >> > Currently, as you are iterating messages returned by
>> SimpleConsumer,
>> >> you
>> >> >> > also get the offset for the next message. In the map, you
can just
>> run
>> >> >> for
>> >> >> > 30 mins and save the next offset for the next run.
>> >> >> >
>> >> >> > Thanks,
>> >> >> >
>> >> >> > Jun
>> >> >> >
>> >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <mypostboxat@gmail.com>
>> wrote:
>> >> >> >
>> >> >> >> Hi ,
>> >> >> >>
>> >> >> >> I looked at hadoop-consumer , which fetches data directly
from the
>> >> kafka
>> >> >> >> broker . But from what i understand it is based on min
and max
>> offset
>> >> >> and
>> >> >> >> map task complete once they reach the maximum offset for
a given
>> >> topic .
>> >> >> >>
>> >> >> >> In our use case we would not know about the max offset
before
>> hand.
>> >> >> Instead
>> >> >> >> we want map to keep reading data from a min offset and
roll over
>> >> every
>> >> >> 30
>> >> >> >> mins . At 30th min we would again generate the offsets
which
>> would be
>> >> >> used
>> >> >> >> for the next run.
>> >> >> >>
>> >> >> >> any suggestions would be helpful .
>> >> >> >>
>> >> >> >> regards,
>> >> >> >> rks
>> >> >> >>
>> >> >>
>> >>
>>

Mime
View raw message