storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay Patil <vijay2110.t...@gmail.com>
Subject Re: max parallelism values
Date Tue, 28 Feb 2017 04:54:50 GMT
Hi Pradeep,

Ideally you should start with just 1 worker, with minimum number of spout
threads (1 or 2, you will need more if latency is high while consuming from
SQS and committing to it).
Use LocalOrShuffle grouping to reduce traffic across Storm nodes.
Try to achieve maximum throughput with this, by tuning parallelism for bolt
and max_spout_pending nob.
So you will get to know how much throughput you can achieve with just 1
worker, capture parallelism per worker.
Generally Storm topologies are horizontally scalable, so keep on adding
workers and keep increasing parallelism on the same scale.

Regards,
Vijay

On 28 February 2017 at 10:03, pradeep s <sreekumar.pradeep@gmail.com> wrote:

> I am not consuming from Kafka. SQS is a queue service from AWS. I am
> consuming from SQS and writing to AWS S3 and AWS RDS database.
> How can we arrive at the parallelism for spouts and bolts for getting the
> maximum throughput.
>
>
> On Mon, Feb 27, 2017 at 11:02 AM, Thomas Cristanis <
> thomascristanis@gmail.com> wrote:
>
>> I do not know how these mechanisms you mentioned (SQS, S3 and database),
>> but Kafka is recommended to set the parallelism hit number according to the
>> amount of Kafka partitions <https://kafka.apache.org/documentation/>.
>> The number of parallelism hit there is in addition to the amount of Kafka
>> partitions the executors have become idle.
>> I have seen recommendations in hortonworks <http://hortonworks.com/blog/>
>> that the number of executors does not exceed the number of cores.
>>
>>
>> --
>> Thomas Cristanis
>>
>> 2017-02-27 15:05 GMT-03:00 pradeep s <sreekumar.pradeep@gmail.com>:
>>
>>> Hi,
>>> I am running a 5 worker node storm cluster on ec2 m4.2x large machines.
>>> Each machine is having 8 cores . I have a spout which is consuming from
>>> SQS and two bolts ,one to S3 and another to a database .
>>> What will be the maximum parallelism value i can assign for Spout and
>>> Bolt.
>>> Since i have 8*5= 40 cores in total, is 40 the max parallelism i can
>>> give for spout and bolt .
>>> Also if i assign 40 parallelism hint for spout, will the bolt
>>> parallelism value can be the same?
>>> Regards
>>> Pradeep S
>>>
>>
>>
>

Mime
View raw message