kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: random access
Date Wed, 13 Jun 2012 18:05:23 GMT
If the access is by offset then there will be one seek (if the data doesn't
fit in memory) or no seeks (if it is cached). The pagecache will
automatically fill all free memory on the machine. If the access is by some
secondary index of key=>offset that you maintain then it will depend on the
efficiency of your index.

-Jay

On Wed, Jun 13, 2012 at 7:49 AM, S Ahmed <sahmed1020@gmail.com> wrote:

> So I'll just have to create one then I guess if I want to do this.  I was
> planning on doing this:
>
> prod#1 -> kafka#1 -> consumer  -> prod#2 -> kafka#2 central
>
> kafka-central will have long lasting messages.
>
> So in the consumer that pulls off the kafka#2 will filter messages, and
> then I can create an index that maps offset to messageId.
>
> Just wondering how fast random access to a kafka fill will be, like will it
> be as fast as a db lookup.  it's a memory mapped file so it should be fast
> in theory but when the # of files grows things will degrade.
>
> On Wed, Jun 13, 2012 at 10:01 AM, Jay Kreps <jay.kreps@gmail.com> wrote:
>
> > There is no scanning, we compute the message location from the offset and
> > begin fetching there.
> >
> > Sent from my iPhone
> >
> > On Jun 13, 2012, at 6:40 AM, S Ahmed <sahmed1020@gmail.com> wrote:
> >
> > > I was thinking of replicating messages to a central location, and
> having
> > a
> > > very long expire date on the messages (like say 1 year).
> > >
> > > My requirement would be able to not just stream messages, but access
> > > messages by key, similiar to a "SELECT * FROM TABLE WHERE id=123"
> > >
> > > From I understand, currently their is no index file that maps messages
> to
> > > their exact location in a file correct?  i.e. kafka streams the
> messages,
> > > so it goes to a .kafka file, starts from the beginning and streams the
> > data
> > > to a consumer.  If your offset happends to be in the middle of the
> file,
> > it
> > > will scan the file, start at the beginning of the message, figure out
> the
> > > length of the message, and then jump to the position of the next
> message
> > > until it finds the correct message offset, is this correct?
> > >
> > > i.e. I would have to create some sort of index that maps the offset to
> > the
> > > 'messageId' (where the messageId is stored in the body of the message
> > > itself).
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message