storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipa Moura <filipa.mendesmo...@gmail.com>
Subject Re: kafkaspout is very slow
Date Wed, 04 Feb 2015 23:47:45 GMT
I would bump these numbers up by a lot:

kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400

Say 10 or 100 times that or more. I dont know by heart how much I increased
those numbers on my topology.

How many bytes are you writting per minute on kafka? Try dumping 1 minute
of messages to a file to figure out how many bytes that is..
I am reading (sending data to the topic) about 100,000 records per second.
My kafka consumer can consume the 3 millions records in less than 50
seconds. I have disabled the ack. But with the ack enabled, I won't even
get 1500 records per second from the topology. With ack disabled, I get
about 12000/second.
I don't lose any data, it is just the data is emitted from the spout to the
bolt very slowly.

 I did bump my buffer sizes but I am not sure if they are sufficient.

    topology.transfer.buffer.size: 2048
    topology.executor.buffer.size: 65536
    topology.receiver.buffer.size: 16
    topology.executor.send.buffer.size: 65536

    kafka.fetch.size.bytes: 102400
    kafka.buffer.size.bytes: 102400

thanks
Clay

On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <filipa.mendesmoura@gmail.com>
wrote:

> can you share a  screenshot of the Storm UI for your spout?
>
> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <clayteahouse@gmail.com>
> wrote:
>
>>  I have this issue with any amount of load. Different max spout pendings
>> do not seem to make much a difference. I've lowered this parameter to 100,
>> still a little difference . At this point the bolt consuming the data does
>> no processing.
>>
>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <haralds@evilezh.net>
>> wrote:
>>
>>> I'm not sure, that i understand your problem .. but here is few points:
>>> If you have large pending spout size and slow processing - you will see
>>> large latency at kafka spout probably. Spout emits message .. it stays in
>>> queue for long time (that will add latency) .. and finally is processed and
>>> ack received. You will see queue time + processing time in kafka spout
>>> latency.
>>> Take a look at load factors of your bolts - are they close to 1 or more
>>> ? and load factor of kafka spout.
>>>
>>> On 4 February 2015 at 21:19, Andrey Yegorov <andrey.yegorov@gmail.com>
>>> wrote:
>>>
>>>> have you tried increasing max spout pending parameter for the spout?
>>>>
>>>> builder.setSpout("kafka",
>>>>                        new KafkaSpout(spoutConfig),
>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>           //the maximum parallelism you can have on a KafkaSpout is the
>>>> number of partitions
>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>
>>>> ----------
>>>> Andrey Yegorov
>>>>
>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <clayteahouse@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>> latency. I have tried different spout max pending and played with the
>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize
the
>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>> throughput with the simple kafka consumer.
>>>>>
>>>>> thank you for your feedback
>>>>> Clay
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message