kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Koshy <jjkosh...@gmail.com>
Subject Re: Consumer Multi-Fetch
Date Thu, 06 Mar 2014 02:46:08 GMT
On Thu, Mar 06, 2014 at 02:27:34AM +0000, Sandon Jacobs wrote:
> I understand replication uses a multi-fetch concept to maintain the replicas of each
partition. I have a use case where it might be beneficial to grab a “batch” of messages
from a kafka topic and process them as one unit into a source system – in my use case, sending
the messages to a Flume source.
> 
> My questions:
> 
>   *   Is it possible to fetch a back of messages in which you may not know the exact
message size?

The high-level consumer actually uses multi-fetch. You will need to
have some idea of the max message size and set your fetch size
accordingly. Unfortunately if you are consuming a very large number of
topics this can increase the memory requirements of the consumer.  We
intend to address this in the consumer re-write - there is a separate
design review thread on that.

>   *   If so, how are the offsets managed?

The consumer essentially pre-fetches and queues the chunks in memory
and the offsets are not incremented/check-pointed until the
application thread actually iterates over the messsages.

> I am trying to avoid queuing them in memory and batching in my process for several reasons.

The high-level consumer does queuing as described above, but you can
reduce the number of queued chunks.

Joel


Mime
View raw message