storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabh...@gmail.com>
Subject Re: Bolt slow down the emit speed of spout
Date Wed, 19 Oct 2016 01:47:49 GMT
You need to make it clear that you're using multiple 'topics' or multiple
'partitions'.
There's no way to read partition in parallel, and it's a design of Kafka.
If you mean topic, you can increase parallelism of Spout to same as
partition count.

2016년 10월 19일 (수) 오전 10:42, Chen Junfeng <k-2feng@hotmail.com>님이 작성:

> Thanks for your reply.
>
>
>
> Maybe I am close to the reason. I found the cpu usage of spout worker is
> really very high, about 200% to 400% via ‘top’ command. But KafkaSpout with
> Storm seems not support using multi spouts instance to read one topic.
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>
> *发件人: *Jungtaek Lim <kabhwan@gmail.com>
> *发送时间: *2016年10月19日 9:18
> *收件人: *Chien Le <Chien.Le@ds-iq.com>; user@storm.apache.org
> *主题: *Re: Bolt slow down the emit speed of spout
>
>
> Could you change your bolt to no-op but ack and run again? It makes your
> benchmark more clearer.
>
> And if you pick your number from the first 10 minutes, could you pick your
> number from another 10 minutes? It gets rid of worker / executor deploy /
> startup time, and warm-up of JVM.
>
> - Jungtaek Lim (HeartSaVioR)
> On Wed, 19 Oct 2016 at 10:00 AM Chen Junfeng <k-2feng@hotmail.com> wrote:
>
> I used ‘dstat -a’ which measures cpu, memory, disk and network.  Disk
> should not be the bottleneck as my topology doesn’t have much disk
> operation. Log.info print tuple to disk every 1000 tuples.
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>
> *发件人: *Chien Le <Chien.Le@ds-iq.com>
> *发送时间: *2016年10月19日 5:39
> *收件人: *user@storm.apache.org
> *主题: *RE: Bolt slow down the emit speed of spout
>
>
>
> dstat -d is measuring your disk throughput, yes? Did you mean dstat -n?
>
>
>
> Otherwise, dstat -d showing 300MB/s doesn't eliminate disk as your
> bottleneck. I would try using iostat -x 1 and see what your %util and svc
> times are to truly eliminate disk as the bottleneck.
>
>
>
> -Chien
>
>
>
> *From:* Chen Junfeng [mailto:k-2feng@hotmail.com]
> *Sent:* Monday, October 17, 2016 11:30 PM
> *To:* user@storm.apache.org
> *Subject:* Bolt slow down the emit speed of spout
>
>
>
>
>
> I found a tough but interesting issue about Storm 1.0.2 to share with you.
>
> Firstly I introduce my system structure. My topology has 471 workers, 6
> kafka topic and some processor bolt. Backpressure has been disabled in
> topology by
>
> conf.put(Config.*TOPOLOGY_BACKPRESSURE_ENABLE*, false);
>
>
>
> If I submit KafkaSpout only, the statistic data from Storm UI is as below:
>
> Topology stats
>
> Window
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> 10m 0s
>
> 875161429
>
> 0
>
> 0
>
> 875141691
>
> 3h 0m 0s
>
> 927117120
>
> 0
>
> 0
>
> 927117320
>
> 1d 0h 0m 0s
>
> 927117120
>
> 0
>
> 0
>
> 927117320
>
> All time
>
> 927117120
>
> 0
>
> 0
>
> 927117320
>
> Spouts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> Spout_JSCTGW_144
>
> 2
>
> 2
>
> 128701261
>
> 0
>
> 0
>
> 128702858
>
> 0
>
> Spout_JSCTGW_145
>
> 2
>
> 2
>
> 162347454
>
> 0
>
> 0
>
> 162347639
>
> 0
>
> Spout_JSCTGW_146
>
> 2
>
> 2
>
> 135598494
>
> 0
>
> 0
>
> 135598608
>
> 0
>
> Spout_JSCTGW_147
>
> 2
>
> 2
>
> 128307822
>
> 0
>
> 0
>
> 128306102
>
> 0
>
> Spout_JSCTGW_148
>
> 2
>
> 2
>
> 160369513
>
> 0
>
> 0
>
> 160360423
>
> 0
>
> Spout_JSCTGW_149
>
> 2
>
> 2
>
> 159836885
>
> 0
>
> 0
>
> 159826061
>
> 0
>
>
>
>
>
> In 10 minutes time, 6 kafkaSpout read more than 800 million tuple from
> kafka cluster.
>
> Then I submit the topology with a simple PrintBolt, the 10min stat data is
> as following:
>
> Spouts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> Error Host
>
> Error Port
>
> Spout_JSCTGW_144
>
> 2
>
> 2
>
> 113599789
>
> 113599789
>
> 0
>
> 113513630
>
> 0
>
> Spout_JSCTGW_145
>
> 2
>
> 2
>
> 122501522
>
> 122501575
>
> 0
>
> 122659659
>
> 0
>
> Spout_JSCTGW_146
>
> 2
>
> 2
>
> 91308915
>
> 91308915
>
> 0
>
> 91308652
>
> 0
>
> Spout_JSCTGW_147
>
> 2
>
> 2
>
> 105029568
>
> 105029568
>
> 0
>
> 104968275
>
> 0
>
> Spout_JSCTGW_148
>
> 2
>
> 2
>
> 115889172
>
> 115889178
>
> 0
>
> 115890165
>
> 0
>
> Spout_JSCTGW_149
>
> 2
>
> 2
>
> 115185591
>
> 115185591
>
> 0
>
> 115185638
>
> 0
>
> Showing 1 to 6 of 6 entries
>
> *663526019*
>
> Bolts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Capacity (last 10m)
>
> Execute latency (ms)
>
> Executed
>
> Process latency (ms)
>
> Acked
>
> PrintBolt
>
> 240
>
> 240
>
> 0
>
> 0
>
> 0.241
>
> 0.041
>
> 665063824
>
> 0
>
> 665,036,902
>
>
>
> The printbolt contains nothing but a simple log.info method, * but the
> total number of emitted tuple of spout decrease to about 600 million*.
> All of machines have 32 cores but the system average load never exceed 15.
> Also I have checked the network load. My network adapter has 10Gb bandwidth
> and the dstat -d command shows the max rec or send speed is 300MB/s at most
> , which doesn’t reach the limit of network adapter. The total number of
> executors is less than that of total workers, so performance should not be
> problem.
>
>
>
> Then I do some more test,
>
> Only one kafkaspout reading on topic
>
>
>
> Spouts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> Spout_JSCTGW_145
>
> 2
>
> 2
>
> 148918096
>
> 0
>
> 0
>
> 148917805
>
> 0
>
>
>
> One kafkaspout and one data processing bolt
>
>
>
> Spouts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> Spout_JSCTGW_145
>
> 2
>
> 2
>
> 106481751
>
> 106481751
>
> 0
>
> 106367801
>
> 0
>
> Showing 1 to 1 of 1 entries
>
> Bolts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Capacity (last 10m)
>
>

Mime
View raw message