storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick R. Katsipoulakis" <nick.kat...@gmail.com>
Subject Re: Approach to parallelism
Date Mon, 05 Oct 2015 15:34:17 GMT
Well, did you see the capacity of your bolts also increasing? Because if
the former is true, then that means that your use-case is compute-bound and
you end up having increased latency for your workers.

Nick

On Mon, Oct 5, 2015 at 11:32 AM, Kashyap Mhaisekar <kashyap.m@gmail.com>
wrote:

> After dropping to 1 worker per node, my latency was bad.... Probably my
> use case was different.
>
> On Mon, Oct 5, 2015 at 10:29 AM, Nick R. Katsipoulakis <
> nick.katsip@gmail.com> wrote:
>
>> Hello guys,
>>
>> This is a really interesting discussion. I am also trying to fine-tune
>> the performance of my cluster and especially my end-to-end-latency which
>> ranges from 200-1200 msec for a topology with 2 spouts (each one with 2k
>> tuples per second input rate) and 3 bolts. My cluster consists of 3
>> zookeeper nodes (1 shared with nimbus) and 6 supervisor nodes, all of them
>> being AWS m4.xlarge instances.
>>
>> I am pretty sure that the latency I am experiencing is ridiculous and I
>> currently have no ideas what to do to improve that. I have 3 workers per
>> node, which I will drop it to one worker per node after this discussion and
>> see if I have better results.
>>
>> Thanks,
>> Nick
>>
>> On Mon, Oct 5, 2015 at 10:40 AM, Kashyap Mhaisekar <kashyap.m@gmail.com>
>> wrote:
>>
>>> Anshu,
>>> My methodology was as follows. Since the true parallelism of a machine
>>> is the the no. of cores, I set the workers equal to no. of cores. (5 in my
>>> case). That being said, since we have 32 GB per box, we usually leave 50%
>>> off leaving us 16 GB spread across 5 machines. Hence we set the worker heap
>>> at 3g.
>>>
>>> This was before Javiers and Michaels suggestion of keeping one JVM per
>>> node...
>>>
>>> Ours is a single topology running on the boxes and hence I would be
>>> changing it to one JVM (worker) per box and rerunning.
>>>
>>> Thanks
>>> Kashyap
>>>
>>> On Mon, Oct 5, 2015 at 9:18 AM, anshu shukla <anshushukla0@gmail.com>
>>> wrote:
>>>
>>>> Sorry for reposting !! Any suggestions Please .
>>>>
>>>> Just one query How we can map -
>>>> *1-no of workers to number of  cores *
>>>> *2-no of slots on one machine to number of cores over that machine*
>>>>
>>>> On Mon, Oct 5, 2015 at 7:32 PM, John Yost <soozandjohnyost@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Javier,
>>>>>
>>>>> Gotcha, I am seeing the same thing, and I see a ton of worker restarts
>>>>> as well.
>>>>>
>>>>> Thanks
>>>>>
>>>>> --John
>>>>>
>>>>> On Mon, Oct 5, 2015 at 9:01 AM, Javier Gonzalez <jagonzal@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I don't have numbers, but I did see a very noticeable degradation
of
>>>>>> throughput and latency when using multiple workers per node with
the same
>>>>>> topology.
>>>>>> On Oct 5, 2015 7:25 AM, "John Yost" <soozandjohnyost@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Everyone,
>>>>>>>
>>>>>>> I am curious--are there any benchmark numbers that demonstrate
how
>>>>>>> much better one worker per node is?  The reason I ask is that
I may need to
>>>>>>> double up the workers on my cluster and I was wondering how much
of a
>>>>>>> throughput hit I may take from having two workers per node.
>>>>>>>
>>>>>>> Any info would be very much appreciated--thanks! :)
>>>>>>>
>>>>>>> --John
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <jagonzal@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I would suggest sticking with a single worker per machine.
It makes
>>>>>>>> memory allocation easier and it makes inter-component communication
much
>>>>>>>> more efficient. Configure the executors with your parallelism
hints to take
>>>>>>>> advantage of all your availabe CPU cores.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> JG
>>>>>>>>
>>>>>>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <
>>>>>>>> kashyap.m@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> I was trying to come up with an approach to evaluate
the
>>>>>>>>> parallelism needed for a topology.
>>>>>>>>>
>>>>>>>>> Assuming I have 5 machines with 8 cores and 32 gb. And
my topology
>>>>>>>>> has one spout and 5 bolts.
>>>>>>>>>
>>>>>>>>> 1. Define one worker port per CPU to start off. (= 8
workers per
>>>>>>>>> machine ie 40 workers over all)
>>>>>>>>> 2. Each worker spawns one executor per component per
worker, it
>>>>>>>>> translates to 6 executors per worker which is 40x6= 240
executors.
>>>>>>>>> 3. Of this, if the bolt logic is CPU intensive, then
leave
>>>>>>>>> parallelism hint  at 40 (total workers), else increase
parallelism hint
>>>>>>>>> beyond 40 till you hit a number beyond which there is
no more visible
>>>>>>>>> performance.
>>>>>>>>>
>>>>>>>>> Does this look right?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Kashyap
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Javier González Nicolini
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Anshu Shukla
>>>>
>>>
>>>
>>
>>
>> --
>> Nikolaos Romanos Katsipoulakis,
>> University of Pittsburgh, PhD student
>>
>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD student

Mime
View raw message