storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipa Moura <filipa.mendesmo...@gmail.com>
Subject Re: kafkaspout is very slow
Date Wed, 04 Feb 2015 21:58:54 GMT
How many messages are you reading per second?
I had a few problems with my spout originally but it was either because
1) was not acking the messages and because of max pending they weren't been
thrown away from the "queue"
2) buffer size and fetch size was too small: have you tried to figure out
how many bytes you write from Kafka and increase the sizes to that size?
this helped in my case.
3) was trying to read too far from the past when I restarted the topology
so ended up consuming only latest offset.

With the above tweaks I was able to increase my throughput to 9 times
more..it obviously depends on size of messages but this helped me..
as Haralds suggested, have a look at the dashboard and try to understand
where the problem is..


On Wed, Feb 4, 2015 at 9:26 PM, Haralds Ulmanis <haralds@evilezh.net> wrote:

> I'm not sure, that i understand your problem .. but here is few points:
> If you have large pending spout size and slow processing - you will see
> large latency at kafka spout probably. Spout emits message .. it stays in
> queue for long time (that will add latency) .. and finally is processed and
> ack received. You will see queue time + processing time in kafka spout
> latency.
> Take a look at load factors of your bolts - are they close to 1 or more ?
> and load factor of kafka spout.
>
> On 4 February 2015 at 21:19, Andrey Yegorov <andrey.yegorov@gmail.com>
> wrote:
>
>> have you tried increasing max spout pending parameter for the spout?
>>
>> builder.setSpout("kafka",
>>                        new KafkaSpout(spoutConfig),
>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>           //the maximum parallelism you can have on a KafkaSpout is the
>> number of partitions
>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>
>> ----------
>> Andrey Yegorov
>>
>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <clayteahouse@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> In my topology,  kafka spout is responsible for over 85% of the latency.
>>> I have tried different spout max pending and played with the buffer size
>>> and fetch size, still no luck. Any hint on how to optimize the spout? The
>>> issue doesn't seem to be with the kafka side, as I see high throughput with
>>> the simple kafka consumer.
>>>
>>> thank you for your feedback
>>> Clay
>>>
>>>
>>
>

Mime
View raw message