storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Rose <mich...@fullcontact.com>
Subject Re: kafkaspout is very slow
Date Wed, 04 Feb 2015 23:54:10 GMT
You might increase the number of ackers too if acking is slow.

*Michael Rose*
Senior Platform Engineer
*Full*Contact | fullcontact.com
<https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>
m: +1.720.837.1357 | t: @xorlev


All Your Contacts, Updated and In One Place.
Try FullContact for Free
<https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>

On Wed, Feb 4, 2015 at 4:47 PM, Filipa Moura <filipa.mendesmoura@gmail.com>
wrote:

> I would bump these numbers up by a lot:
>
> kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400
>
> Say 10 or 100 times that or more. I dont know by heart how much I
> increased those numbers on my topology.
>
> How many bytes are you writting per minute on kafka? Try dumping 1 minute
> of messages to a file to figure out how many bytes that is..
> I am reading (sending data to the topic) about 100,000 records per second.
> My kafka consumer can consume the 3 millions records in less than 50
> seconds. I have disabled the ack. But with the ack enabled, I won't even
> get 1500 records per second from the topology. With ack disabled, I get
> about 12000/second.
> I don't lose any data, it is just the data is emitted from the spout to
> the bolt very slowly.
>
>  I did bump my buffer sizes but I am not sure if they are sufficient.
>
>     topology.transfer.buffer.size: 2048
>     topology.executor.buffer.size: 65536
>     topology.receiver.buffer.size: 16
>     topology.executor.send.buffer.size: 65536
>
>     kafka.fetch.size.bytes: 102400
>     kafka.buffer.size.bytes: 102400
>
> thanks
> Clay
>
> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <filipa.mendesmoura@gmail.com
> > wrote:
>
>> can you share a  screenshot of the Storm UI for your spout?
>>
>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <clayteahouse@gmail.com>
>> wrote:
>>
>>>  I have this issue with any amount of load. Different max spout pendings
>>> do not seem to make much a difference. I've lowered this parameter to 100,
>>> still a little difference . At this point the bolt consuming the data does
>>> no processing.
>>>
>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <haralds@evilezh.net>
>>> wrote:
>>>
>>>> I'm not sure, that i understand your problem .. but here is few points:
>>>> If you have large pending spout size and slow processing - you will see
>>>> large latency at kafka spout probably. Spout emits message .. it stays in
>>>> queue for long time (that will add latency) .. and finally is processed and
>>>> ack received. You will see queue time + processing time in kafka spout
>>>> latency.
>>>> Take a look at load factors of your bolts - are they close to 1 or more
>>>> ? and load factor of kafka spout.
>>>>
>>>> On 4 February 2015 at 21:19, Andrey Yegorov <andrey.yegorov@gmail.com>
>>>> wrote:
>>>>
>>>>> have you tried increasing max spout pending parameter for the spout?
>>>>>
>>>>> builder.setSpout("kafka",
>>>>>                        new KafkaSpout(spoutConfig),
>>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>           //the maximum parallelism you can have on a KafkaSpout is
>>>>> the number of partitions
>>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>>
>>>>> ----------
>>>>> Andrey Yegorov
>>>>>
>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <clayteahouse@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>>> latency. I have tried different spout max pending and played with
the
>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize
the
>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see
high
>>>>>> throughput with the simple kafka consumer.
>>>>>>
>>>>>> thank you for your feedback
>>>>>> Clay
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message