kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Koshy <jjkosh...@gmail.com>
Subject Re: Consumer State Description in design.html
Date Fri, 13 Apr 2012 21:11:57 GMT
Jun, I think Ed is suggesting a good improvement to the design doc: line
203 on
http://svn.apache.org/viewvc/incubator/kafka/site/design.html?view=markup

That paragraph does seem to mix up the discussion between the high-level
consumer and consumers that maintain their own state for more fine-grained
"rewindability". At least, the first two lines of that paragraph seem to be
talking about the high-level consumer, but not very clearly.

Thanks,

Joel

On Fri, Apr 13, 2012 at 12:01 PM, Edward Smith <esmith@stardotstar.org>wrote:

> Ack!  No!  I'm sorry, I'm probably just confusing the issue.  I just
> want to clarify the docs, not change the functionality.
>
> Maybe I'll try to sum it up the way I would write the jira:
>
> "Design.html is confusing to new users when it comes to where offset
> data is stored by consumers."
>
>
> On Fri, Apr 13, 2012 at 2:48 PM, Jun Rao <junrao@gmail.com> wrote:
> > Ed,
> >
> > It seems that you are proposing a pluggable consumer offset store. We
> don't
> > have that now. Could you open a jira for that?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <esmith@stardotstar.org
> >wrote:
> >
> >> Jun,
> >>
> >> Let me try to rephrase to see if I can get this point across more
> clearly.
> >>
> >> I've been exploring the design by running the console tools.  The
> >> console consumer stores offset data in ZK.   This appears to be the
> >> default behavior in a Kafka deployment.  For example, if you skip down
> >> to "Consumers and Consumer Groups", it says that offsets are stored in
> >> ZK.
> >>
> >> This paragraph that I want to change, is basically describing an
> >> alternative technique of tracking offsets.  It has been confusing to
> >> me as I've tried to understand the design of Kafka, so I want to see
> >> if we can clarify it somehow.
> >>
> >> Ed
> >>
> >> On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <junrao@gmail.com> wrote:
> >> > Ed,
> >> >
> >> > The design page only describes how the high level consumer (which most
> >> > people use) works. The high level consumer currently doesn't expose
> >> > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is
> >> not
> >> > described. We can have a wiki describing it and put your content
> there.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <
> esmith@stardotstar.org
> >> >wrote:
> >> >
> >> >> Sorry.... here it is with more clarity:
> >> >>
> >> >> Basically I'm adding to the beginning of the 2nd section titled
> >> "Consumer
> >> >> State"
> >> >>
> >> >> ----------------------------------------
> >> >> <h3>Consumer State</h3> (the second heading like this in
the file)
> >> >> <p>
> >> >> In Kafka, the consumers are responsible for maintaining state
> >> >> information on what has been  consumed.  The core Kafka consumers
> >> >> write their state data to zookeeper.
> >> >> </p>
> >> >> <p>
> >> >> However, it may be beneficial for consumers to write state data into
> >> >> the same datastore where they are writing the results of their
> >> >> processing.  For example, the consumer may simply be entering some
> >> >> aggregate value into a centralized......
> >> >> ..
> >> >> (rest of section remains the same from here)
> >> >> ..
> >> >> </p>
> >> >> ------------------------------------------
> >> >>
> >> >>
> >> >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <junrao@gmail.com> wrote:
> >> >> > Ed,
> >> >> >
> >> >> > I don't see the change you want to make. Apache mailing list
> doesn't
> >> take
> >> >> > attachments. If you have attachments, the easiest way is probably
> to
> >> >> attach
> >> >> > that to a jira.
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Jun
> >> >> >
> >> >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <
> >> esmith@stardotstar.org
> >> >> >wrote:
> >> >> >
> >> >> >> I didn't want to open up a bug unless there was some concurrence
> on
> >> >> >> this.   Please review the change below and see if I'm just
> >> >> >> misunderstanding things or not.  This paragraph in the doc
took
> me a
> >> >> >> long time to digest because it was describing the contrib/hadoop
> >> >> >> consumer and not how simpleconsumer or consoleconsumer work:
> >> >> >>
> >> >> >> Consumer State (the second heading like this in the file)
> >> >> >>
> >> >> >> In Kafka, the consumers are responsible for maintaining state
> >> >> >> information on what has been consumed.  The core Kafka consumers
> >> write
> >> >> >> their state data to zookeeper.
> >> >> >>
> >> >> >> However, it may be beneficial for consumers to write state
data
> into
> >> >> >> the same datastore where they are writing the results of their
> >> >> >> processing.  For example, the consumer may simply be entering
some
> >> >> >> aggregate value into a centralized...... (rest of section
remains
> the
> >> >> >> same from here)
> >> >> >>
> >> >> >> Ed
> >> >> >>
> >> >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <junrao@gmail.com>
> wrote:
> >> >> >> > Currently, as you are iterating messages returned by
> >> SimpleConsumer,
> >> >> you
> >> >> >> > also get the offset for the next message. In the map,
you can
> just
> >> run
> >> >> >> for
> >> >> >> > 30 mins and save the next offset for the next run.
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> >
> >> >> >> > Jun
> >> >> >> >
> >> >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <mypostboxat@gmail.com>
> >> wrote:
> >> >> >> >
> >> >> >> >> Hi ,
> >> >> >> >>
> >> >> >> >> I looked at hadoop-consumer , which fetches data
directly from
> the
> >> >> kafka
> >> >> >> >> broker . But from what i understand it is based on
min and max
> >> offset
> >> >> >> and
> >> >> >> >> map task complete once they reach the maximum offset
for a
> given
> >> >> topic .
> >> >> >> >>
> >> >> >> >> In our use case we would not know about the max offset
before
> >> hand.
> >> >> >> Instead
> >> >> >> >> we want map to keep reading data from a min offset
and roll
> over
> >> >> every
> >> >> >> 30
> >> >> >> >> mins . At 30th min we would again generate the offsets
which
> >> would be
> >> >> >> used
> >> >> >> >> for the next run.
> >> >> >> >>
> >> >> >> >> any suggestions would be helpful .
> >> >> >> >>
> >> >> >> >> regards,
> >> >> >> >> rks
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message