kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sites <Eric.Si...@threattrack.com>
Subject Re: Very low volume topic
Date Wed, 14 Aug 2013 02:13:03 GMT
Responses inline

On 8/13/13 9:57 PM, "Philip O'Toole" <philip@loggly.com> wrote:

>My experience is solely with 0.72. More inline.

I am currently using 0.8.

>On Tue, Aug 13, 2013 at 6:47 PM, Eric Sites <Eric.Sites@threattrack.com>
>> Hello everyone,
>> I have a very low volume topic that has 2 consumers in the same group.
>>How do I get each consumer to only consume 1 message at a time and if
>>the the first consumer is busy get the other consumer to consume the
>You can't, not if you only have one partition. Each consumer is
>dedicated to a single partition. Unless you deliberately tear down the
>consumer and let another take over that partition (if you are using
>the high-level consumer).

I am using multiple partitions, currently 4 partitions.

>> Currently what I am doing is:
>> First consumer connects to Kafka waits for 300 milliseconds then
>>disconnects, waits for 10 seconds, then reconnects to see if there is a
>>waiting message.
>I don't think you need to do this. The high-level has a API that
>allows you to set this timeout (I think).

I am using that timeout on the high-level consumer, that is the 300
millisecond wait period. Then I do a consumer.shutdown(), wait 10 seconds
and reconnect.

>> The messages kick off a long task on each server, each server can
>>handle multiple tasks up to a limit so first I am trying to balance the
>>tasks across multiple servers and if they are maxed out don't consume
>>any messages.
>> This will give the other server or servers a chances to pickup a
>>message and do the task.
>> I would not disconnect if I can ensure I don't have messages waiting in
>>the queue for a server to consume them without the other servers being
>>able to see them.
>I think a better design would be to have a basic consumer that drains
>the topic and hands jobs to the set of available workers. *Those*
>workers perform the long-running job. Only if there are no available
>workers does the consumer block. You may be trying to do too much in
>the consumer.

The available workers are entire servers, that can produce lots of network
IO and generate 100k+ Kafka messages to other Kafka topics that get
consumed by Hadoop and other systems.

I used Kafka for these start job messages because I already was using
Kafka for other messages, and I will most likely add more servers to
consume this start job messages.

I donĀ¹t know how long the job will take until I consume the start job
message. Sometimes it may only take seconds or could take hours.

I have a managed thread pool that only allows x number of tasks types to
run at one time from each job, so that one job does not overwhelm a single
server. This allows a server to handle multiple things while waiting on
the network IO. 

My only issue is balancing the job start messages across multiple servers
depending on the servers load/available threads in the thread pool.

The only real issue I am currently having is that I think this frequent
connect/disconnect is causing issue on the Kafka servers with rebalancing
the 4 topics back and forth between the worker servers.

>> Thanks for the help...
>> Cheers,
>> Eric Sites

- Eric Sites

View raw message