storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From clay teahouse <clayteaho...@gmail.com>
Subject Re: kafkaspout is very slow
Date Thu, 05 Feb 2015 03:20:50 GMT
I bumped the kafka buffer/fetch sizes to

kafka.fetch.size.bytes:  12582912
kafka.buffer.size.bytes: 12582912

The throughput almost doubled (to about 23000 un-acked tuples/second). It
seems increasing the sizes for these two parameters further does not
improve the performance further. Is there anything else that I can try?

On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <clayteahouse@gmail.com>
wrote:

> 100,000 records is about 12MB.
> I'll try bumping the numbers, by 100 fold to see if it makes any
> difference.
> thanks,
> -Clay
>
> On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <filipa.mendesmoura@gmail.com
> > wrote:
>
>> I would bump these numbers up by a lot:
>>
>> kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400
>>
>> Say 10 or 100 times that or more. I dont know by heart how much I
>> increased those numbers on my topology.
>>
>> How many bytes are you writting per minute on kafka? Try dumping 1 minute
>> of messages to a file to figure out how many bytes that is..
>> I am reading (sending data to the topic) about 100,000 records per
>> second. My kafka consumer can consume the 3 millions records in less than
>> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even
>> get 1500 records per second from the topology. With ack disabled, I get
>> about 12000/second.
>> I don't lose any data, it is just the data is emitted from the spout to
>> the bolt very slowly.
>>
>>  I did bump my buffer sizes but I am not sure if they are sufficient.
>>
>>     topology.transfer.buffer.size: 2048
>>     topology.executor.buffer.size: 65536
>>     topology.receiver.buffer.size: 16
>>     topology.executor.send.buffer.size: 65536
>>
>>     kafka.fetch.size.bytes: 102400
>>     kafka.buffer.size.bytes: 102400
>>
>> thanks
>> Clay
>>
>> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <
>> filipa.mendesmoura@gmail.com> wrote:
>>
>>> can you share a  screenshot of the Storm UI for your spout?
>>>
>>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <clayteahouse@gmail.com>
>>> wrote:
>>>
>>>>  I have this issue with any amount of load. Different max spout
>>>> pendings do not seem to make much a difference. I've lowered this parameter
>>>> to 100, still a little difference . At this point the bolt consuming the
>>>> data does no processing.
>>>>
>>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <haralds@evilezh.net>
>>>> wrote:
>>>>
>>>>> I'm not sure, that i understand your problem .. but here is few points:
>>>>> If you have large pending spout size and slow processing - you will
>>>>> see large latency at kafka spout probably. Spout emits message .. it
stays
>>>>> in queue for long time (that will add latency) .. and finally is processed
>>>>> and ack received. You will see queue time + processing time in kafka
spout
>>>>> latency.
>>>>> Take a look at load factors of your bolts - are they close to 1 or
>>>>> more ? and load factor of kafka spout.
>>>>>
>>>>> On 4 February 2015 at 21:19, Andrey Yegorov <andrey.yegorov@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> have you tried increasing max spout pending parameter for the spout?
>>>>>>
>>>>>> builder.setSpout("kafka",
>>>>>>                        new KafkaSpout(spoutConfig),
>>>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>           //the maximum parallelism you can have on a KafkaSpout
is
>>>>>> the number of partitions
>>>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>>>
>>>>>> ----------
>>>>>> Andrey Yegorov
>>>>>>
>>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <clayteahouse@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>>>> latency. I have tried different spout max pending and played
with the
>>>>>>> buffer size and fetch size, still no luck. Any hint on how to
optimize the
>>>>>>> spout? The issue doesn't seem to be with the kafka side, as I
see high
>>>>>>> throughput with the simple kafka consumer.
>>>>>>>
>>>>>>> thank you for your feedback
>>>>>>> Clay
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message