storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From clay teahouse <clayteaho...@gmail.com>
Subject Re: kafkaspout is very slow
Date Thu, 05 Feb 2015 11:56:40 GMT
CPU is around 100%

On Wed, Feb 4, 2015 at 9:30 PM, Michael Rose <michael@fullcontact.com>
wrote:

> How does your CPU look at 23000 tuples/s? Still low?
>
> Have you profiled to see if anything is blocking? Is your spout constantly
> doing work?
>
> *Michael Rose*
> Senior Platform Engineer
> *Full*Contact | fullcontact.com
> <https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>
> m: +1.720.837.1357 | t: @xorlev
>
>
> All Your Contacts, Updated and In One Place.
> Try FullContact for Free
> <https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>
>
> On Wed, Feb 4, 2015 at 8:20 PM, clay teahouse <clayteahouse@gmail.com>
> wrote:
>
>> I bumped the kafka buffer/fetch sizes to
>>
>> kafka.fetch.size.bytes:  12582912
>> kafka.buffer.size.bytes: 12582912
>>
>> The throughput almost doubled (to about 23000 un-acked tuples/second). It
>> seems increasing the sizes for these two parameters further does not
>> improve the performance further. Is there anything else that I can try?
>>
>> On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <clayteahouse@gmail.com>
>> wrote:
>>
>>> 100,000 records is about 12MB.
>>> I'll try bumping the numbers, by 100 fold to see if it makes any
>>> difference.
>>> thanks,
>>> -Clay
>>>
>>> On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <
>>> filipa.mendesmoura@gmail.com> wrote:
>>>
>>>> I would bump these numbers up by a lot:
>>>>
>>>> kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400
>>>>
>>>> Say 10 or 100 times that or more. I dont know by heart how much I
>>>> increased those numbers on my topology.
>>>>
>>>> How many bytes are you writting per minute on kafka? Try dumping 1
>>>> minute of messages to a file to figure out how many bytes that is..
>>>> I am reading (sending data to the topic) about 100,000 records per
>>>> second. My kafka consumer can consume the 3 millions records in less than
>>>> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even
>>>> get 1500 records per second from the topology. With ack disabled, I get
>>>> about 12000/second.
>>>> I don't lose any data, it is just the data is emitted from the spout to
>>>> the bolt very slowly.
>>>>
>>>>  I did bump my buffer sizes but I am not sure if they are sufficient.
>>>>
>>>>     topology.transfer.buffer.size: 2048
>>>>     topology.executor.buffer.size: 65536
>>>>     topology.receiver.buffer.size: 16
>>>>     topology.executor.send.buffer.size: 65536
>>>>
>>>>     kafka.fetch.size.bytes: 102400
>>>>     kafka.buffer.size.bytes: 102400
>>>>
>>>> thanks
>>>> Clay
>>>>
>>>> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <
>>>> filipa.mendesmoura@gmail.com> wrote:
>>>>
>>>>> can you share a  screenshot of the Storm UI for your spout?
>>>>>
>>>>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <clayteahouse@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>  I have this issue with any amount of load. Different max spout
>>>>>> pendings do not seem to make much a difference. I've lowered this
parameter
>>>>>> to 100, still a little difference . At this point the bolt consuming
the
>>>>>> data does no processing.
>>>>>>
>>>>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <haralds@evilezh.net>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm not sure, that i understand your problem .. but here is few
>>>>>>> points:
>>>>>>> If you have large pending spout size and slow processing - you
will
>>>>>>> see large latency at kafka spout probably. Spout emits message
.. it stays
>>>>>>> in queue for long time (that will add latency) .. and finally
is processed
>>>>>>> and ack received. You will see queue time + processing time in
kafka spout
>>>>>>> latency.
>>>>>>> Take a look at load factors of your bolts - are they close to
1 or
>>>>>>> more ? and load factor of kafka spout.
>>>>>>>
>>>>>>> On 4 February 2015 at 21:19, Andrey Yegorov <
>>>>>>> andrey.yegorov@gmail.com> wrote:
>>>>>>>
>>>>>>>> have you tried increasing max spout pending parameter for
the spout?
>>>>>>>>
>>>>>>>> builder.setSpout("kafka",
>>>>>>>>                        new KafkaSpout(spoutConfig),
>>>>>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>>>           //the maximum parallelism you can have on a KafkaSpout
is
>>>>>>>> the number of partitions
>>>>>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>>>>>
>>>>>>>> ----------
>>>>>>>> Andrey Yegorov
>>>>>>>>
>>>>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <
>>>>>>>> clayteahouse@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> In my topology,  kafka spout is responsible for over
85% of the
>>>>>>>>> latency. I have tried different spout max pending and
played with the
>>>>>>>>> buffer size and fetch size, still no luck. Any hint on
how to optimize the
>>>>>>>>> spout? The issue doesn't seem to be with the kafka side,
as I see high
>>>>>>>>> throughput with the simple kafka consumer.
>>>>>>>>>
>>>>>>>>> thank you for your feedback
>>>>>>>>> Clay
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message