storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chi Hoang <...@groupon.com>
Subject Re: Optimizing Kafka Stream
Date Sat, 31 May 2014 18:50:15 GMT
Raphael,
You can try tuning your parallelism (and num workers).

For Kafka 0.7, your spout parallelism could max out at: # brokers x #
partitions (for the topic).  If you have 4 Kafka brokers, and your topic
has 5 partitions, then you could set the spout parallelism to 20 to
maximize the throughput.

For Kafka 0.8+, your spout parallelism could max out at # partitions for
the topic, so if your topic has 5 partitions, then you would set the spout
parallelism to 5.  To increase parallelism, you would need to increase the
number of partitions for your topic (by using the add partitions utility).

As for the number of workers, setting it to 1 means that your spout will
only run on a single Storm node, and would likely share resources with
other Storm processes (spouts and bolts).  I recommend to increase the
number of workers so Storm has a chance to spread out the work, and keep a
good balance.

Hope this helps.

Chi


On Fri, May 30, 2014 at 4:24 PM, Raphael Hsieh <raffihsieh@gmail.com> wrote:

> I am in the process of optimizing my stream. Currently I expect 5 000 000
> tuples to come out of my spout per minute. I am trying to beef up my
> topology in order to process this in real time without falling behind.
>
> For some reason my batch size is capping out at 83 thousand tuples. I
> can't seem to make it any bigger. the processing time doesn't seem to get
> any smaller than 2-3 seconds either.
> I'm not sure how to configure the topology to get any faster / more
> efficient.
>
> Currently all the topology does is a groupby on time and an aggregation
> (Count) to aggregate everything.
>
> Here are some data points i've figured out.
>
> Batch Size:5mb
> num-workers: 1
> parallelismHint: 2
> (I'll write this a 5mb, 1, 2)
>
> 5mb, 1, 2 = 83K tuples / 6s
> 10mb, 1, 2 = 83k / 7s
> 5mb, 1, 4 = 83k / 6s
> 5mb, 2, 4 = 83k / 3s
> 5mb, 3, 6 = 83k / 3s
> 10mb, 3, 6 = 83k / 3s
>
> Can anybody help me figure out how to get it to process things faster ?
>
> My maxSpoutPending is at 1, but when I increased it to 2 it was the same.
> MessageTimeoutSec = 100
>
> I've been following this blog: https://gist.github.com/mrflip/5958028
> to an extent, not everything word for word though.
>
> I need to be able to process around 66,000 tuples per second and I'm
> starting to run out of ideas.
>
> Thanks
>
> --
> Raphael Hsieh
>
>
>

Mime
View raw message