storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Taylor <br...@resolvingarchitecture.com>
Subject Re: 回复: RE: How to Improve Storm Application's Throughput
Date Wed, 09 Aug 2017 12:42:34 GMT
Unsubscribe

⁣Sent from BlueMail ​

On Aug 9, 2017, 8:40 AM, at 8:40 AM, "Hannum, Daniel" <Daniel_Hannum@PremierInc.com>
wrote:
>I think the problem is that capacity of 3.5. That indicates that
>there’s a backlog on that bolt, so it’s saying that actual time spent
>processing in the bolt is small, but the total time spent (including
>wait time) is large. Scale the bolts up or scale the spout down or make
>the bolt faster
>
>From: "fanxinpu@travelsky.com" <fanxinpu@travelsky.com>
>Reply-To: "user@storm.apache.org" <user@storm.apache.org>
>Date: Tuesday, August 8, 2017 at 11:02 PM
>To: user <user@storm.apache.org>, libo19 <libo19@asiainfo.com>, kabhwan
><kabhwan@gmail.com>
>Subject: 回复: RE: How to Improve Storm Application's Throughput
>
>****This email did not originate from the Premier, Inc. network. Use
>caution when opening attachments or clicking on URLs.*****
>
>
>.
>Hi LiBo, Jungtaek :
>
>Yes, storm tunning depends on situations. Thank u for your kindly
>advice.
>The follow is one of my situations, any hints from you will be
>appreciated. The storm version is 1.0.0.
>The topology just has a spout which reads message from kafka and a bolt
>to parse the message and put it into the hbase.
>[cid:image001.png@01D310E7.19BFB400]
>As you can see from the above picture, the Execute latency of the bolt
>is small(0.5ms), but the Complete latency is much more larger(4365ms),
>so as to slow down the throughput of the topology.
>Which part will consume so much additional time? the transfer between
>the spout and the bolt ? or the ack part? I tried to increase
>parallelism for the component, but it did not work.
>Is there a tool to analyze the time consumption in general? It will be
>a great news to know it.
>
>There is another thing to explain in the above picture. It seems that
>the Capacity is high as 1.617, but there are 64 bolts, most Capacity of
>it is low, as picture below shows.
>[cid:image002.png@01D310E7.19BFB400]
>[cid:image003.png@01D310E7.19BFB400]
>So, another puzzle is both the Execute latency and the Executed is
>about equal, but the Capacity turns out to be so much different.  Any
>hints?
>
>The follow is another topology.
>[cid:image004.png@01D310E7.19BFB400]
>Maybe the history_Put bolt has both high Capacity and larger Execute
>latency, this would definitely lead to the Complete latency as 56964ms?
>
>Thank you all for your time.
>
>
>________________________________
>Joshua
>
>
>
>发件人: 李波<mailto:libo19@asiainfo.com>
>发送时间: 2017-08-07 16:56
>收件人: user@storm.apache.org<mailto:user@storm.apache.org>
>抄送:
>jiangyc_cui_gd@si-tech.com.cn<mailto:jiangyc_cui_gd@si-tech.com.cn>;
>'zhangxl_bds'<mailto:zhangxl_bds@si-tech.com.cn>
>主题: RE: How to Improve Storm Application's Throughput
>你好!
>
>Storm的性能排查过程需要不断的尝试最后达到一个经验值,我个人的排查过程如下,希望可以有一些帮助:
>
>1、Kafka PartitionNumber and  KafkaSpout’s parallelism
>首先你要确认是KafkaSpout的接收能力不行导致的延迟,还是由于后续的bolt处理能力有限造成拥堵,导致的上游KafkaSpout也一起拥堵造成的延迟
>KafkaSpout接收能力不行的话,需要增加Kakfa的分区数量,同时把KafkaSpout的并行度设置为和分区数量一致,这样逐步提升以达到吞吐要求
>
>2、Bolt’s business logical and Bolt’s parallelism
>首先看下是不是硬件资源不行了。。
>其次,需要查出拥堵在哪一个bolt造成的拥堵,可以配合Storm的那个动态图来判断越红拥堵越厉害,同时参考Capacity这个值来判断超过1的Bolt或多或少会出现拥堵的情况
>根据自己的算法来优化处理逻辑提升效率,或者通过增加Bolt并行度的方式来提升Bolt处理能力(前提是硬件资源没有到上限)
>
>从你的这个业务来看应该是后续的几个Bolt要访问外部的数据存储系统进行最终结果的存储,关注一下存取数据是否延迟较大,目标系统压力比较大
>
>多数情况都是由于Bolt处理能力不够造成的,需要找出压力点优化业务处理逻辑和同时调整Bolt并行度
>关注下bolt的逻辑是计算密集型还是以来外部io,计算密集型只能增加worker
>
>3、Config.TOPOLOGY_BACKPRESSURE_ENABLE
>是否开启了反压机制Config.TOPOLOGY_BACKPRESSURE_ENABLE (好像是storm1.0.0以后才有的)
>
>4、Netty
>另外要想提升Storm在底层传输上的吞吐量,可以通过修改storm.yaml的netty配置,来提升netty的发送批量大
>
>5、Executor‘s throughput params
>// Net io set
>config.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 1024 * 16); // default
>is 1024
>config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 1024 * 16);//
>batched; default is 1024
>config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 1024 * 16); //
>individual tuples; default is 1024
>config.put(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS, 200);
>
>
>________________________________
>李波 13813887096 libo19@asiainfo.com<mailto:libo19@asiainfo.com>
>北京亚信智慧数据科技有限公司
>亚信是我家 发展靠大家
>
>From: 王 纯超 [mailto:wangchunchao@outlook.com]
>Sent: 2017年8月7日 10:58
>To: user <user@storm.apache.org>
>Cc: 姜艳春jiangyc_cui_gd@si-tech.com.cn <jiangyc_cui_gd@si-tech.com.cn>;
>zhangxl_bds <zhangxl_bds@si-tech.com.cn>
>Subject: How to Improve Storm Application's Throughput
>
>Hi,
>
>I am now considering improve a Storm application's throughput because I
>find that the consumption speed of KafkaSpout is slower than the
>producing speed. And the lag gets larger and larger. Below is the bolt
>statistics. I tried to bring forward the tuple projection and filtering
>logic in a custom scheme with intention of reducing network traffic.
>However, after observation, things go contrary to my wishes. Am I going
>the wrong way? Are there any principles tuning Storm applications? Or
>could anyone give some suggestions for this specific case?
>[cid:image005.jpg@01D310E7.19BFB400]
>________________________________
>wangchunchao@outlook.com<mailto:wangchunchao@outlook.com>

Mime
View raw message