storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabh...@gmail.com>
Subject Re: Bolt slow down the emit speed of spout
Date Wed, 19 Oct 2016 01:18:23 GMT
Could you change your bolt to no-op but ack and run again? It makes your
benchmark more clearer.

And if you pick your number from the first 10 minutes, could you pick your
number from another 10 minutes? It gets rid of worker / executor deploy /
startup time, and warm-up of JVM.

- Jungtaek Lim (HeartSaVioR)
On Wed, 19 Oct 2016 at 10:00 AM Chen Junfeng <k-2feng@hotmail.com> wrote:

> I used ‘dstat -a’ which measures cpu, memory, disk and network.  Disk
> should not be the bottleneck as my topology doesn’t have much disk
> operation. Log.info print tuple to disk every 1000 tuples.
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>
> *发件人: *Chien Le <Chien.Le@ds-iq.com>
> *发送时间: *2016年10月19日 5:39
> *收件人: *user@storm.apache.org
> *主题: *RE: Bolt slow down the emit speed of spout
>
>
>
> dstat -d is measuring your disk throughput, yes? Did you mean dstat -n?
>
>
>
> Otherwise, dstat -d showing 300MB/s doesn't eliminate disk as your
> bottleneck. I would try using iostat -x 1 and see what your %util and svc
> times are to truly eliminate disk as the bottleneck.
>
>
>
> -Chien
>
>
>
> *From:* Chen Junfeng [mailto:k-2feng@hotmail.com]
> *Sent:* Monday, October 17, 2016 11:30 PM
> *To:* user@storm.apache.org
> *Subject:* Bolt slow down the emit speed of spout
>
>
>
>
>
> I found a tough but interesting issue about Storm 1.0.2 to share with you.
>
> Firstly I introduce my system structure. My topology has 471 workers, 6
> kafka topic and some processor bolt. Backpressure has been disabled in
> topology by
>
> conf.put(Config.*TOPOLOGY_BACKPRESSURE_ENABLE*, false);
>
>
>
> If I submit KafkaSpout only, the statistic data from Storm UI is as below:
>
> Topology stats
>
> Window
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> 10m 0s
>
> 875161429
>
> 0
>
> 0
>
> 875141691
>
> 3h 0m 0s
>
> 927117120
>
> 0
>
> 0
>
> 927117320
>
> 1d 0h 0m 0s
>
> 927117120
>
> 0
>
> 0
>
> 927117320
>
> All time
>
> 927117120
>
> 0
>
> 0
>
> 927117320
>
> Spouts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> Spout_JSCTGW_144
>
> 2
>
> 2
>
> 128701261
>
> 0
>
> 0
>
> 128702858
>
> 0
>
> Spout_JSCTGW_145
>
> 2
>
> 2
>
> 162347454
>
> 0
>
> 0
>
> 162347639
>
> 0
>
> Spout_JSCTGW_146
>
> 2
>
> 2
>
> 135598494
>
> 0
>
> 0
>
> 135598608
>
> 0
>
> Spout_JSCTGW_147
>
> 2
>
> 2
>
> 128307822
>
> 0
>
> 0
>
> 128306102
>
> 0
>
> Spout_JSCTGW_148
>
> 2
>
> 2
>
> 160369513
>
> 0
>
> 0
>
> 160360423
>
> 0
>
> Spout_JSCTGW_149
>
> 2
>
> 2
>
> 159836885
>
> 0
>
> 0
>
> 159826061
>
> 0
>
>
>
>
>
> In 10 minutes time, 6 kafkaSpout read more than 800 million tuple from
> kafka cluster.
>
> Then I submit the topology with a simple PrintBolt, the 10min stat data is
> as following:
>
> Spouts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> Error Host
>
> Error Port
>
> Spout_JSCTGW_144
>
> 2
>
> 2
>
> 113599789
>
> 113599789
>
> 0
>
> 113513630
>
> 0
>
> Spout_JSCTGW_145
>
> 2
>
> 2
>
> 122501522
>
> 122501575
>
> 0
>
> 122659659
>
> 0
>
> Spout_JSCTGW_146
>
> 2
>
> 2
>
> 91308915
>
> 91308915
>
> 0
>
> 91308652
>
> 0
>
> Spout_JSCTGW_147
>
> 2
>
> 2
>
> 105029568
>
> 105029568
>
> 0
>
> 104968275
>
> 0
>
> Spout_JSCTGW_148
>
> 2
>
> 2
>
> 115889172
>
> 115889178
>
> 0
>
> 115890165
>
> 0
>
> Spout_JSCTGW_149
>
> 2
>
> 2
>
> 115185591
>
> 115185591
>
> 0
>
> 115185638
>
> 0
>
> Showing 1 to 6 of 6 entries
>
> *663526019*
>
> Bolts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Capacity (last 10m)
>
> Execute latency (ms)
>
> Executed
>
> Process latency (ms)
>
> Acked
>
> PrintBolt
>
> 240
>
> 240
>
> 0
>
> 0
>
> 0.241
>
> 0.041
>
> 665063824
>
> 0
>
> 665,036,902
>
>
>
> The printbolt contains nothing but a simple log.info method, *but the
> total number of emitted tuple of spout decrease to about 600 million*.
> All of machines have 32 cores but the system average load never exceed 15.
> Also I have checked the network load. My network adapter has 10Gb bandwidth
> and the dstat -d command shows the max rec or send speed is 300MB/s at most
> , which doesn’t reach the limit of network adapter. The total number of
> executors is less than that of total workers, so performance should not be
> problem.
>
>
>
> Then I do some more test,
>
> Only one kafkaspout reading on topic
>
>
>
> Spouts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> Spout_JSCTGW_145
>
> 2
>
> 2
>
> 148918096
>
> 0
>
> 0
>
> 148917805
>
> 0
>
>
>
> One kafkaspout and one data processing bolt
>
>
>
> Spouts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Complete latency (ms)
>
> Acked
>
> Failed
>
> Spout_JSCTGW_145
>
> 2
>
> 2
>
> 106481751
>
> 106481751
>
> 0
>
> 106367801
>
> 0
>
> Showing 1 to 1 of 1 entries
>
> Bolts (10m 0s)
>
> Search:
>
> Id
>
> Executors
>
> Tasks
>
> Emitted
>
> Transferred
>
> Capacity (last 10m)
>
> Execute latency (ms)
>
> Executed
>
> AtomPluginBolt
>
> 1000
>
> 1000
>
> 132752203
>
> 0
>
> 0.019
>
> 0.067
>
> 106519630
>
>
>
>
>
> Even I only read one kafka topic, the emit speed is also slowed down.
>
>
>
> Does anyone meet the same problem?
>
>
>
>
>

Mime
View raw message