kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <ma...@modelcitizen.com>
Subject RE: random access performance of messages.
Date Fri, 21 Oct 2011 17:19:31 GMT
Thanks for the responses, and pardon my newbie status.

@sharad
>> Also using kafka as *long* term message store is not a good usecase.

To be more specific about my message lifetime/volume, in my case storage
would be < one month (in the range of a few Terrabytes in size).

@neha
>> . Instead of using kafka for random message lookups, you could use it as
the persistent message bus between the publishers of the messages and your
indexing system.

Yes, that is what I intended by first approach I suggested.  Granted that is
the most apparent path, but I'm trying to consider if I can save all the
time/resources needed to essentially move the data out of Kafka into a
secondary db.  In this case the only purpose of the secondary store would be
to house the message data.  If it's already in Kafka, then why not just
leave it there?

@sharad
>> kafka is more suited for sequential message reads. Not really meant for
random message lookups.

>From my basic understanding of the API it would appear that reading (using a
checkpoint) always begins with random access?  Eg. Below code excerpt from
the wiki quickstart.  I assume the FetchRequest() call is a random access
read?

Does that mean the initial FetchRequest() is considered to be "slow"?

Can you give any concrete number that would give a sense of exactly how slow
is "slow"?

Is the concern also that too many random accesses will degrade write
performance?

Thank you.

long offset = 0;
while (true) {
  // create a fetch request for topic “test”, partition 0, current offset,
and fetch size of 1MB
  FetchRequest fetchRequest = new FetchRequest("test", 0, offset, 1000000);

  // get the message set from the consumer and print them out
  ByteBufferMessageSet messages = consumer.fetch(fetchRequest);
  for(Message message : messages) {
    System.out.println("consumed: " + Utils.toString(message.payload(),
"UTF-8"));
    // advance the offset after consuming each message
    offset += MessageSet.entrySize(message);
  }
}

-----Original Message-----
From: Neha Narkhede [mailto:neha.narkhede@gmail.com] 
Sent: Friday, October 21, 2011 1:02 PM
To: kafka-users@incubator.apache.org
Subject: Re: random access performance of messages.

Marko,

I agree with Sharad. Instead of using kafka for random message lookups, you
could use it as the persistent message bus between the publishers of the
messages and your indexing system.
Using the low level consumer API (SimpleConsumer), you could set up your
indexer processes to pull from the broker partitions for a topic.
You would have to checkpoint your Kafka
offsets to match the data indexed and flushed to disk, and re-fetch data
from Kafka, if/when the indexer fails.

Thanks,
Neha

On Fri, Oct 21, 2011 at 9:47 AM, Sharad Agarwal <sharad.apache@gmail.com>
wrote:
> kafka is more suited for sequential message reads. Not really meant 
> for random message lookups.
>
> Also using kafka as *long* term message store is not a good usecase.
>
> On Fri, Oct 21, 2011 at 9:32 PM, <marko@modelcitizen.com> wrote:
>
>> I would like to use Kafka to process messages that need to be 
>> immutably stored for a N-days, and during that period the msgs need 
>> to be indexed, searched, as well as retrieval of msg data that is
queried.
>>
>>
>>
>> One approach is to read messages from Kafka and store the messages in 
>> a secondary db for query and data retrieval.  Once the messages are 
>> read and processed into the secondary db, then the messages can be 
>> discarded from the Kafka queue.
>>
>>
>>
>> Another approach is to read the messages, build an external index for 
>> searching that directly references the message data by Kafka-key in 
>> the Kafka queue itself.  In this case the Kafka becomes the message 
>> store for the life of the message/data.
>>
>>
>>
>> The latter would be ideal for me if the performance of query-by-key 
>> and message data retrieval is very good.
>>
>>
>>
>> Is random query of message+data good for Kafka?  Is this an 
>> appropriate usecase for Kafka?
>>
>>
>>
>> Thank you.
>>
>>
>>
>> Marko.
>>
>> .
>>
>>
>>
>>
>
>
> --
> Thanks
> Sharad Agarwal
> Hadoop and Avro Committer
> Technology Platforms, InMobi
> *Disclaimer: Opinions expressed here are my own and do not represent 
> past or present employers.*
>


Mime
View raw message