kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <tombrow...@gmail.com>
Subject Re: Fetch messages since a specific time?
Date Mon, 17 Dec 2012 20:41:52 GMT
Each message does not have a time stamp. Groups of messages (I think the
default is around 500mb) are stored in individual files, and the time stamp
parameter will find the offset at the beginning of the file that has that
time stamp-- not really helpful for your use case.

The accepted solution is to store the offsets in a DB, or some other
location.

--Tom

On Monday, December 17, 2012, Mathias Söderberg wrote:

> Hm, alright. Haven't really used the method to anything besides getting
> first and last offset (using -1 and -2 as timestamps IIRC) of a
> topic+partition combination.
>
> Maybe someone else can shed some light on this?
>
> Cheers,
> Mathias
>
>
> On 17 December 2012 19:51, Jason Huang <jason.huang@icare.com<javascript:;>>
> wrote:
>
> > Mathias,
> >
> > Thanks for response. I am not sure if this timestamp is the Unix time
> > or not. I've tried the following:
> >
> > Create 3 messages of the same topic, at the same partition like this:
> > 1355769714152: Jason has a new message 1
> > 1355769964900: Jason has a new message 2
> > 1355769980296: Jason has a new message 3
> >
> > I then tried to call getOffsetsBefore with a timestamp = 1355769964999
> > (99 milliseconds after the timestamp in message two above), hoping to
> > get some offset but the long array returned by the call is empty.
> >
> > Some google search found that getOffsetsBefore is based on the mime of
> > the log segments. In other words, if I only have one log file
> > 00000000000000000000.kafka in the topic directory (log/topic-0), then
> > the offset array returned by this call will always be 0?
> >
> > If so, this API is probably not designed for my use case.
> >
> > thanks,
> >
> > Jason
> >
> >
> > On Mon, Dec 17, 2012 at 1:40 PM, Mathias Söderberg
> > <mathias.soederberg@gmail.com <javascript:;>> wrote:
> > > The SimpleConsumer API [1] has a method called getOffsetsBefore which
> > takes
> > > a topic, partition, timestamp (UNIX I assume since it's a long) and
> > integer
> > > limit on how many offsets to get.
> > >
> > > Might not solve your problem *exactly*, but could be useful, unless
> > you're
> > > using the ConsumerConnector?
> > >
> > > [1]: http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/
> > >
> > >
> > > On 17 December 2012 19:23, Jason Huang <jason.huang@icare.com<javascript:;>>
> wrote:
> > >
> > >> Hello,
> > >>
> > >> Is it possible to fetch messages from the Kafka message queue since a
> > >> specific time? For example, a user may subscribe to a topic and the
> > >> producer will continuously publish messages related to this topic. The
> > >> first time this user logs in, we will fetch all the messages from the
> > >> beginning. However, the next time this user logs in, we want to only
> > >> fetch the "new" messages. In other words, messages since the user's
> > >> last log out time.
> > >>
> > >> Is there any API in Kafka that allows us to do that? I am not sure if
> > >> Kafka actually stores a timestamp with each message as the message's
> > >> meta data. If not, is there any way to fetch the offset related to the
> > >> user's last log out time?
> > >>
> > >> One way that I can think of to do this is to store the offset of the
> > >> last message this user consumers before he logged out of the system
> > >> (persist this offset at a DB). The next time this user logs in, we
> > >> will read the DB to get that offset and start from there to fetch
> > >> messages. However, if there is a better way to do this in Kafka, then
> > >> it will save me the work to write/read from the DB.
> > >>
> > >> thanks!
> > >>
> > >> Jason
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message